You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Parth Chandra <pa...@apache.org> on 2015/09/01 07:24:00 UTC

Re: Potential resource for large scale testing

Hi Edmon,
  Sorry no one seems to have got back to you on this.
  We are in the process of publishing a test suite for regression testing
Drill and the cluster you have (even a few nodes ) would be a great
resource for folks to run the test suite. Rahul, et al are working on this
and I would suggest watching out for Rahul's posts on the topic.

Parth

On Tue, Aug 25, 2015 at 9:55 PM, Edmon Begoli <eb...@gmail.com> wrote:

> Hey folks,
>
> As we discussed today on a hangout, this is a machine that we have at
> JICS/NICS
> where I have Drill installed and where I could set up a test cluster over
> few nodes.
>
> https://www.nics.tennessee.edu/computing-resources/beacon/configuration
>
> Note that each node is:
> - 2x8-core Intel® Xeon® E5-2670 processors
> - 256 GB of memory
> - 4 Intel® Xeon Phi™ coprocessors 5110P with 8 GB of memory each
> - 960 GB of SSD storage
>
> Would someone advise on what would be an interesting test setup?
>
> Thank you,
> Edmon
>

Re: Potential resource for large scale testing

Posted by Edmon Begoli <eb...@gmail.com>.
Steven - send me please your contact info (email for now, preferably Apache
if you have one or Dremio) to ebegoliATutkDOTedu.

Thank you,
Edmon

On Fri, Sep 25, 2015 at 12:18 PM, Jacques Nadeau <ja...@dremio.com> wrote:

> That is great news! From the Dremio side, I propose working with Steven.
> Let's start taking advantage of this awesome resource!
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Wed, Sep 23, 2015 at 5:34 PM, Edmon Begoli <eb...@gmail.com> wrote:
>
> > This request has been approved. I will get more details tomorrow.
> >
> > I could add to the resource few members of the Drill team, maybe one
> person
> > from MapR and one from Dremio
> > who can have access and can assist in configuring (or instructing
> resource
> > sysadmins) how to run the big tests, if desired.
> > They will need to apply and get RSA tokens.
> >
> > Then we can talk how to make this resource a part of the regular testing
> > and benchmarking process.
> >
> > Thank you,
> > Edmon
> >
> > On Fri, Sep 18, 2015 at 8:00 PM, Edmon Begoli <eb...@gmail.com> wrote:
> >
> > > I requested 5000 hours a year on Beacon for Apache Drill for high
> > > performance benchmarking, testing and optimization.
> > > I will let you know of the resolution pretty soon. I expect these
> > > resources to be awarded to the project.
> > >
> > >
> > > On Fri, Sep 18, 2015 at 6:22 PM, Parth Chandra <pc...@maprtech.com>
> > > wrote:
> > >
> > >> +1 on running the build and tests.
> > >> If we need to run some kind of stress tests, we could consider running
> > >> TPC-H/TPC-DS at large scale factors.
> > >>
> > >> On Fri, Sep 18, 2015 at 2:24 PM, Jacques Nadeau <ja...@dremio.com>
> > >> wrote:
> > >>
> > >> > Not offhand. It really depends on how the time would work. For
> > example,
> > >> it
> > >> > would be nice if we had an automated perfectly fressh (no .m2/repo)
> > >> nightly
> > >> > build and full test suite run so people can always check the status.
> > >> Maybe
> > >> > we use this hardware for that?
> > >> >
> > >> > --
> > >> > Jacques Nadeau
> > >> > CTO and Co-Founder, Dremio
> > >> >
> > >> > On Fri, Sep 18, 2015 at 9:48 AM, rahul challapalli <
> > >> > challapallirahul@gmail.com> wrote:
> > >> >
> > >> > > Edmon,
> > >> > >
> > >> > > We do have the tests available now [1].
> > >> > >
> > >> > > Jacques,
> > >> > >
> > >> > > You expressed interest in making these tests available on an
> Amazon
> > >> > cluster
> > >> > > so that users need not have physical hardware required to run
> these
> > >> > tests.
> > >> > > Do you have any specific thoughts on how to leverage the resources
> > >> that
> > >> > > Edmon is willing to contribute (performance testing?)
> > >> > >
> > >> > >
> > >> > > [1] https://github.com/mapr/drill-test-framework
> > >> > >
> > >> > > - Rahul
> > >> > >
> > >> > > On Thu, Sep 17, 2015 at 8:49 PM, Edmon Begoli <eb...@gmail.com>
> > >> wrote:
> > >> > >
> > >> > > > I discussed this idea of bringing large compute resource
> yesterday
> > >> with
> > >> > > my
> > >> > > > team at JICS to the project, and there was a general consensus
> > that
> > >> it
> > >> > > can
> > >> > > > be committed.
> > >> > > >
> > >> > > > I will request and hopefully commit pretty large set of
> > >> > > > clustered CPU/storage resources for the needs of a Drill
> project.
> > >> > > >
> > >> > > > I will be the PI for the resource, and could give access to
> > >> whomever we
> > >> > > > want to designate from the Drill project side.
> > >> > > >
> > >> > > > Just let me know. I should have project approved within few
> days.
> > >> > > >
> > >> > > > Edmon
> > >> > > >
> > >> > > >
> > >> > > > On Saturday, September 5, 2015, Edmon Begoli <ebegoli@gmail.com
> >
> > >> > wrote:
> > >> > > >
> > >> > > > > Ted,
> > >> > > > >
> > >> > > > > It is actually very easy and painless to do what I am
> > proposing. I
> > >> > > > > probably made it sound far more bureaucratic/legalistic than
> it
> > >> > really
> > >> > > > is.
> > >> > > > >
> > >> > > > > Researchers and projects from across the globe can apply for
> > >> cycles
> > >> > on
> > >> > > > > Beacon or any other HPC platform we run. (Beacon is by far the
> > >> best
> > >> > and
> > >> > > > we
> > >> > > > > already have a setup to run Spark and Hive on it. (We just
> > >> published
> > >> > > > paper
> > >> > > > > about it at XSEDE on integrating PBS/TORQUE scheduler with
> Spark
> > >> to
> > >> > run
> > >> > > > > JVM-bound jobs))
> > >> > > > >
> > >> > > > > As for use of resources, at the end of year we need to submit
> > >> reports
> > >> > > for
> > >> > > > > all the projects that used compute resources and how.
> > >> > > > > It is part of our mission, as being one of the XSEDE centers,
> to
> > >> > > > > help promote the advancement of the science and technology.
> > >> > > > > Reports from Principal Investigators (PI) show how we did it.
> In
> > >> this
> > >> > > > > case, I can be a PI and have any/someone from the Drill team
> > >> assigned
> > >> > > > > access.
> > >> > > > >
> > >> > > > > I don't think there are any IP issues. Open source project,
> open
> > >> > > research
> > >> > > > > institution, use of resources for testing and benchmarking. We
> > >> could
> > >> > > > > actually make JICS a benchmarking site for Drill (and even
> other
> > >> > Apache
> > >> > > > > projects).
> > >> > > > >
> > >> > > > > We'll discuss other details in a hangout. I am also planning
> to
> > >> brief
> > >> > > my
> > >> > > > > team next Wednesday on the plan for the use of resources.
> > >> > > > >
> > >> > > > > Regards,
> > >> > > > > Edmon
> > >> > > > >
> > >> > > > >
> > >> > > > > On Saturday, September 5, 2015, Ted Dunning <
> > >> ted.dunning@gmail.com
> > >> > > > > <javascript:_e(%7B%7D,'cvml','ted.dunning@gmail.com');>>
> wrote:
> > >> > > > >
> > >> > > > >> Edmon,
> > >> > > > >>
> > >> > > > >> This is very interesting.  I am sure that public
> > >> acknowledgements of
> > >> > > > >> contributions are easily managed.
> > >> > > > >>
> > >> > > > >> What might be even more useful for you would be small scale
> > >> > > > publications,
> > >> > > > >> especially about the problems of shoe-horning real-world data
> > >> > objects
> > >> > > > into
> > >> > > > >> the quasi-relational model of Drill.
> > >> > > > >>
> > >> > > > >> What would be problematic (and what is probably just a matter
> > of
> > >> > > > >> nomenclature) is naming of an institution by the Apache
> > specific
> > >> > term
> > >> > > > >> "committer" (you said commitment). Individuals at your
> > >> institution
> > >> > > would
> > >> > > > >> absolutely be up for being committers as they demonstrate a
> > track
> > >> > > record
> > >> > > > >> of
> > >> > > > >> contribution.
> > >> > > > >>
> > >> > > > >> I would expect no need for any paperwork between JICS and
> > Apache
> > >> > > unless
> > >> > > > >> you
> > >> > > > >> would like to execute a corporate contributor license to
> ensure
> > >> that
> > >> > > > >> particular individuals are specifically empowered to
> contribute
> > >> > code.
> > >> > > I
> > >> > > > >> don't know that the position of JICS is relative to
> > intellectual
> > >> > > > property,
> > >> > > > >> though, so it might be worth checking out institutional
> policy
> > on
> > >> > your
> > >> > > > >> side
> > >> > > > >> on how individuals can contribute to open source projects. It
> > >> > > shouldn't
> > >> > > > be
> > >> > > > >> too hard since there are quite a number of NSF funded people
> > who
> > >> do
> > >> > > > >> contribute.
> > >> > > > >>
> > >> > > > >>
> > >> > > > >>
> > >> > > > >>
> > >> > > > >>
> > >> > > > >> On Fri, Sep 4, 2015 at 9:39 PM, Edmon Begoli <
> > ebegoli@gmail.com>
> > >> > > wrote:
> > >> > > > >>
> > >> > > > >> > I can work with my institution and the NSF that we committ
> > the
> > >> > time
> > >> > > on
> > >> > > > >> the
> > >> > > > >> > Beacon supercomputing cluster to Apache and the Drill
> > project.
> > >> > Maybe
> > >> > > > 20
> > >> > > > >> > hours a month for 4-5 nodes.
> > >> > > > >> >
> > >> > > > >> > I have discretionary hours that I can put in, and I can,
> with
> > >> our
> > >> > > > >> > HPC admins, create deploy scripts on few clustered machines
> > >> (these
> > >> > > are
> > >> > > > >> all
> > >> > > > >> > very large boxes with 16 cores, 256 GB, 40gb IB
> interconnect,
> > >> and
> > >> > > > >> > with local 1 TB SSD each). There is also Medusa 10 PB
> > >> filesystem
> > >> > > > >> attached
> > >> > > > >> > but HDFS over local drives would probably be better.
> > >> > > > >> > They are otherwise just a regular machines, and run regular
> > >> JVMs
> > >> > on
> > >> > > > >> Linux.
> > >> > > > >> >
> > >> > > > >> > We can also get Rahul an access with a secure token to
> setup
> > >> > > > >> > and run stress/performance/integration tests for Drill. I
> can
> > >> > > actually
> > >> > > > >> help
> > >> > > > >> > there as well. This can be automated to run tests and
> collect
> > >> > > results.
> > >> > > > >> >
> > >> > > > >> > I think that the only requirement would be that the JICS
> team
> > >> be
> > >> > > named
> > >> > > > >> for
> > >> > > > >> > commitment because both NSF/XSEDE and UT like to see the
> > >> resources
> > >> > > > >> > being officially used and acknowledged. They are there to
> > >> support
> > >> > > open
> > >> > > > >> and
> > >> > > > >> > academic research; open source projects fit well.
> > >> > > > >> >
> > >> > > > >> > If this sounds OK with the project PMCs, I can start the
> > >> process
> > >> > of
> > >> > > > >> > allocation, accounts creation, setup.
> > >> > > > >> >
> > >> > > > >> > I would also, as a CDO, of JICS sign whatever standard
> papers
> > >> with
> > >> > > > >> > the Apache organization.
> > >> > > > >> >
> > >> > > > >> > With all this being said, let me know please if this is
> > >> something
> > >> > we
> > >> > > > >> want
> > >> > > > >> > to pursue.
> > >> > > > >> >
> > >> > > > >> > Thank you,
> > >> > > > >> > Edmon
> > >> > > > >> >
> > >> > > > >> > On Tuesday, September 1, 2015, Jacques Nadeau <
> > >> jacques@dremio.com
> > >> > >
> > >> > > > >> wrote:
> > >> > > > >> >
> > >> > > > >> > > I spent a bunch of time looking at the Phi coprocessors
> and
> > >> > forgot
> > >> > > > to
> > >> > > > >> get
> > >> > > > >> > > back to the thread. I'd love it if someone spent some
> time
> > >> > looking
> > >> > > > at
> > >> > > > >> > > leveraging them (since Drill is frequently processor
> > bound).
> > >> > Any
> > >> > > > >> takers?
> > >> > > > >> > >
> > >> > > > >> > >
> > >> > > > >> > >
> > >> > > > >> > > --
> > >> > > > >> > > Jacques Nadeau
> > >> > > > >> > > CTO and Co-Founder, Dremio
> > >> > > > >> > >
> > >> > > > >> > > On Mon, Aug 31, 2015 at 10:24 PM, Parth Chandra <
> > >> > > parthc@apache.org
> > >> > > > >> > > <javascript:;>> wrote:
> > >> > > > >> > >
> > >> > > > >> > > > Hi Edmon,
> > >> > > > >> > > >   Sorry no one seems to have got back to you on this.
> > >> > > > >> > > >   We are in the process of publishing a test suite for
> > >> > > regression
> > >> > > > >> > testing
> > >> > > > >> > > > Drill and the cluster you have (even a few nodes )
> would
> > >> be a
> > >> > > > great
> > >> > > > >> > > > resource for folks to run the test suite. Rahul, et al
> > are
> > >> > > working
> > >> > > > >> on
> > >> > > > >> > > this
> > >> > > > >> > > > and I would suggest watching out for Rahul's posts on
> the
> > >> > topic.
> > >> > > > >> > > >
> > >> > > > >> > > > Parth
> > >> > > > >> > > >
> > >> > > > >> > > > On Tue, Aug 25, 2015 at 9:55 PM, Edmon Begoli <
> > >> > > ebegoli@gmail.com
> > >> > > > >> > > <javascript:;>> wrote:
> > >> > > > >> > > >
> > >> > > > >> > > > > Hey folks,
> > >> > > > >> > > > >
> > >> > > > >> > > > > As we discussed today on a hangout, this is a machine
> > >> that
> > >> > we
> > >> > > > >> have at
> > >> > > > >> > > > > JICS/NICS
> > >> > > > >> > > > > where I have Drill installed and where I could set
> up a
> > >> test
> > >> > > > >> cluster
> > >> > > > >> > > over
> > >> > > > >> > > > > few nodes.
> > >> > > > >> > > > >
> > >> > > > >> > > > >
> > >> > > > >> > >
> > >> > > > >>
> > >> > >
> > >>
> https://www.nics.tennessee.edu/computing-resources/beacon/configuration
> > >> > > > >> > > > >
> > >> > > > >> > > > > Note that each node is:
> > >> > > > >> > > > > - 2x8-core Intel® Xeon® E5-2670 processors
> > >> > > > >> > > > > - 256 GB of memory
> > >> > > > >> > > > > - 4 Intel® Xeon Phi™ coprocessors 5110P with 8 GB of
> > >> memory
> > >> > > each
> > >> > > > >> > > > > - 960 GB of SSD storage
> > >> > > > >> > > > >
> > >> > > > >> > > > > Would someone advise on what would be an interesting
> > test
> > >> > > setup?
> > >> > > > >> > > > >
> > >> > > > >> > > > > Thank you,
> > >> > > > >> > > > > Edmon
> > >> > > > >> > > > >
> > >> > > > >> > > >
> > >> > > > >> > >
> > >> > > > >> >
> > >> > > > >>
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: Potential resource for large scale testing

Posted by Jacques Nadeau <ja...@dremio.com>.
That is great news! From the Dremio side, I propose working with Steven.
Let's start taking advantage of this awesome resource!

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Wed, Sep 23, 2015 at 5:34 PM, Edmon Begoli <eb...@gmail.com> wrote:

> This request has been approved. I will get more details tomorrow.
>
> I could add to the resource few members of the Drill team, maybe one person
> from MapR and one from Dremio
> who can have access and can assist in configuring (or instructing resource
> sysadmins) how to run the big tests, if desired.
> They will need to apply and get RSA tokens.
>
> Then we can talk how to make this resource a part of the regular testing
> and benchmarking process.
>
> Thank you,
> Edmon
>
> On Fri, Sep 18, 2015 at 8:00 PM, Edmon Begoli <eb...@gmail.com> wrote:
>
> > I requested 5000 hours a year on Beacon for Apache Drill for high
> > performance benchmarking, testing and optimization.
> > I will let you know of the resolution pretty soon. I expect these
> > resources to be awarded to the project.
> >
> >
> > On Fri, Sep 18, 2015 at 6:22 PM, Parth Chandra <pc...@maprtech.com>
> > wrote:
> >
> >> +1 on running the build and tests.
> >> If we need to run some kind of stress tests, we could consider running
> >> TPC-H/TPC-DS at large scale factors.
> >>
> >> On Fri, Sep 18, 2015 at 2:24 PM, Jacques Nadeau <ja...@dremio.com>
> >> wrote:
> >>
> >> > Not offhand. It really depends on how the time would work. For
> example,
> >> it
> >> > would be nice if we had an automated perfectly fressh (no .m2/repo)
> >> nightly
> >> > build and full test suite run so people can always check the status.
> >> Maybe
> >> > we use this hardware for that?
> >> >
> >> > --
> >> > Jacques Nadeau
> >> > CTO and Co-Founder, Dremio
> >> >
> >> > On Fri, Sep 18, 2015 at 9:48 AM, rahul challapalli <
> >> > challapallirahul@gmail.com> wrote:
> >> >
> >> > > Edmon,
> >> > >
> >> > > We do have the tests available now [1].
> >> > >
> >> > > Jacques,
> >> > >
> >> > > You expressed interest in making these tests available on an Amazon
> >> > cluster
> >> > > so that users need not have physical hardware required to run these
> >> > tests.
> >> > > Do you have any specific thoughts on how to leverage the resources
> >> that
> >> > > Edmon is willing to contribute (performance testing?)
> >> > >
> >> > >
> >> > > [1] https://github.com/mapr/drill-test-framework
> >> > >
> >> > > - Rahul
> >> > >
> >> > > On Thu, Sep 17, 2015 at 8:49 PM, Edmon Begoli <eb...@gmail.com>
> >> wrote:
> >> > >
> >> > > > I discussed this idea of bringing large compute resource yesterday
> >> with
> >> > > my
> >> > > > team at JICS to the project, and there was a general consensus
> that
> >> it
> >> > > can
> >> > > > be committed.
> >> > > >
> >> > > > I will request and hopefully commit pretty large set of
> >> > > > clustered CPU/storage resources for the needs of a Drill project.
> >> > > >
> >> > > > I will be the PI for the resource, and could give access to
> >> whomever we
> >> > > > want to designate from the Drill project side.
> >> > > >
> >> > > > Just let me know. I should have project approved within few days.
> >> > > >
> >> > > > Edmon
> >> > > >
> >> > > >
> >> > > > On Saturday, September 5, 2015, Edmon Begoli <eb...@gmail.com>
> >> > wrote:
> >> > > >
> >> > > > > Ted,
> >> > > > >
> >> > > > > It is actually very easy and painless to do what I am
> proposing. I
> >> > > > > probably made it sound far more bureaucratic/legalistic than it
> >> > really
> >> > > > is.
> >> > > > >
> >> > > > > Researchers and projects from across the globe can apply for
> >> cycles
> >> > on
> >> > > > > Beacon or any other HPC platform we run. (Beacon is by far the
> >> best
> >> > and
> >> > > > we
> >> > > > > already have a setup to run Spark and Hive on it. (We just
> >> published
> >> > > > paper
> >> > > > > about it at XSEDE on integrating PBS/TORQUE scheduler with Spark
> >> to
> >> > run
> >> > > > > JVM-bound jobs))
> >> > > > >
> >> > > > > As for use of resources, at the end of year we need to submit
> >> reports
> >> > > for
> >> > > > > all the projects that used compute resources and how.
> >> > > > > It is part of our mission, as being one of the XSEDE centers, to
> >> > > > > help promote the advancement of the science and technology.
> >> > > > > Reports from Principal Investigators (PI) show how we did it. In
> >> this
> >> > > > > case, I can be a PI and have any/someone from the Drill team
> >> assigned
> >> > > > > access.
> >> > > > >
> >> > > > > I don't think there are any IP issues. Open source project, open
> >> > > research
> >> > > > > institution, use of resources for testing and benchmarking. We
> >> could
> >> > > > > actually make JICS a benchmarking site for Drill (and even other
> >> > Apache
> >> > > > > projects).
> >> > > > >
> >> > > > > We'll discuss other details in a hangout. I am also planning to
> >> brief
> >> > > my
> >> > > > > team next Wednesday on the plan for the use of resources.
> >> > > > >
> >> > > > > Regards,
> >> > > > > Edmon
> >> > > > >
> >> > > > >
> >> > > > > On Saturday, September 5, 2015, Ted Dunning <
> >> ted.dunning@gmail.com
> >> > > > > <javascript:_e(%7B%7D,'cvml','ted.dunning@gmail.com');>> wrote:
> >> > > > >
> >> > > > >> Edmon,
> >> > > > >>
> >> > > > >> This is very interesting.  I am sure that public
> >> acknowledgements of
> >> > > > >> contributions are easily managed.
> >> > > > >>
> >> > > > >> What might be even more useful for you would be small scale
> >> > > > publications,
> >> > > > >> especially about the problems of shoe-horning real-world data
> >> > objects
> >> > > > into
> >> > > > >> the quasi-relational model of Drill.
> >> > > > >>
> >> > > > >> What would be problematic (and what is probably just a matter
> of
> >> > > > >> nomenclature) is naming of an institution by the Apache
> specific
> >> > term
> >> > > > >> "committer" (you said commitment). Individuals at your
> >> institution
> >> > > would
> >> > > > >> absolutely be up for being committers as they demonstrate a
> track
> >> > > record
> >> > > > >> of
> >> > > > >> contribution.
> >> > > > >>
> >> > > > >> I would expect no need for any paperwork between JICS and
> Apache
> >> > > unless
> >> > > > >> you
> >> > > > >> would like to execute a corporate contributor license to ensure
> >> that
> >> > > > >> particular individuals are specifically empowered to contribute
> >> > code.
> >> > > I
> >> > > > >> don't know that the position of JICS is relative to
> intellectual
> >> > > > property,
> >> > > > >> though, so it might be worth checking out institutional policy
> on
> >> > your
> >> > > > >> side
> >> > > > >> on how individuals can contribute to open source projects. It
> >> > > shouldn't
> >> > > > be
> >> > > > >> too hard since there are quite a number of NSF funded people
> who
> >> do
> >> > > > >> contribute.
> >> > > > >>
> >> > > > >>
> >> > > > >>
> >> > > > >>
> >> > > > >>
> >> > > > >> On Fri, Sep 4, 2015 at 9:39 PM, Edmon Begoli <
> ebegoli@gmail.com>
> >> > > wrote:
> >> > > > >>
> >> > > > >> > I can work with my institution and the NSF that we committ
> the
> >> > time
> >> > > on
> >> > > > >> the
> >> > > > >> > Beacon supercomputing cluster to Apache and the Drill
> project.
> >> > Maybe
> >> > > > 20
> >> > > > >> > hours a month for 4-5 nodes.
> >> > > > >> >
> >> > > > >> > I have discretionary hours that I can put in, and I can, with
> >> our
> >> > > > >> > HPC admins, create deploy scripts on few clustered machines
> >> (these
> >> > > are
> >> > > > >> all
> >> > > > >> > very large boxes with 16 cores, 256 GB, 40gb IB interconnect,
> >> and
> >> > > > >> > with local 1 TB SSD each). There is also Medusa 10 PB
> >> filesystem
> >> > > > >> attached
> >> > > > >> > but HDFS over local drives would probably be better.
> >> > > > >> > They are otherwise just a regular machines, and run regular
> >> JVMs
> >> > on
> >> > > > >> Linux.
> >> > > > >> >
> >> > > > >> > We can also get Rahul an access with a secure token to setup
> >> > > > >> > and run stress/performance/integration tests for Drill. I can
> >> > > actually
> >> > > > >> help
> >> > > > >> > there as well. This can be automated to run tests and collect
> >> > > results.
> >> > > > >> >
> >> > > > >> > I think that the only requirement would be that the JICS team
> >> be
> >> > > named
> >> > > > >> for
> >> > > > >> > commitment because both NSF/XSEDE and UT like to see the
> >> resources
> >> > > > >> > being officially used and acknowledged. They are there to
> >> support
> >> > > open
> >> > > > >> and
> >> > > > >> > academic research; open source projects fit well.
> >> > > > >> >
> >> > > > >> > If this sounds OK with the project PMCs, I can start the
> >> process
> >> > of
> >> > > > >> > allocation, accounts creation, setup.
> >> > > > >> >
> >> > > > >> > I would also, as a CDO, of JICS sign whatever standard papers
> >> with
> >> > > > >> > the Apache organization.
> >> > > > >> >
> >> > > > >> > With all this being said, let me know please if this is
> >> something
> >> > we
> >> > > > >> want
> >> > > > >> > to pursue.
> >> > > > >> >
> >> > > > >> > Thank you,
> >> > > > >> > Edmon
> >> > > > >> >
> >> > > > >> > On Tuesday, September 1, 2015, Jacques Nadeau <
> >> jacques@dremio.com
> >> > >
> >> > > > >> wrote:
> >> > > > >> >
> >> > > > >> > > I spent a bunch of time looking at the Phi coprocessors and
> >> > forgot
> >> > > > to
> >> > > > >> get
> >> > > > >> > > back to the thread. I'd love it if someone spent some time
> >> > looking
> >> > > > at
> >> > > > >> > > leveraging them (since Drill is frequently processor
> bound).
> >> > Any
> >> > > > >> takers?
> >> > > > >> > >
> >> > > > >> > >
> >> > > > >> > >
> >> > > > >> > > --
> >> > > > >> > > Jacques Nadeau
> >> > > > >> > > CTO and Co-Founder, Dremio
> >> > > > >> > >
> >> > > > >> > > On Mon, Aug 31, 2015 at 10:24 PM, Parth Chandra <
> >> > > parthc@apache.org
> >> > > > >> > > <javascript:;>> wrote:
> >> > > > >> > >
> >> > > > >> > > > Hi Edmon,
> >> > > > >> > > >   Sorry no one seems to have got back to you on this.
> >> > > > >> > > >   We are in the process of publishing a test suite for
> >> > > regression
> >> > > > >> > testing
> >> > > > >> > > > Drill and the cluster you have (even a few nodes ) would
> >> be a
> >> > > > great
> >> > > > >> > > > resource for folks to run the test suite. Rahul, et al
> are
> >> > > working
> >> > > > >> on
> >> > > > >> > > this
> >> > > > >> > > > and I would suggest watching out for Rahul's posts on the
> >> > topic.
> >> > > > >> > > >
> >> > > > >> > > > Parth
> >> > > > >> > > >
> >> > > > >> > > > On Tue, Aug 25, 2015 at 9:55 PM, Edmon Begoli <
> >> > > ebegoli@gmail.com
> >> > > > >> > > <javascript:;>> wrote:
> >> > > > >> > > >
> >> > > > >> > > > > Hey folks,
> >> > > > >> > > > >
> >> > > > >> > > > > As we discussed today on a hangout, this is a machine
> >> that
> >> > we
> >> > > > >> have at
> >> > > > >> > > > > JICS/NICS
> >> > > > >> > > > > where I have Drill installed and where I could set up a
> >> test
> >> > > > >> cluster
> >> > > > >> > > over
> >> > > > >> > > > > few nodes.
> >> > > > >> > > > >
> >> > > > >> > > > >
> >> > > > >> > >
> >> > > > >>
> >> > >
> >> https://www.nics.tennessee.edu/computing-resources/beacon/configuration
> >> > > > >> > > > >
> >> > > > >> > > > > Note that each node is:
> >> > > > >> > > > > - 2x8-core Intel® Xeon® E5-2670 processors
> >> > > > >> > > > > - 256 GB of memory
> >> > > > >> > > > > - 4 Intel® Xeon Phi™ coprocessors 5110P with 8 GB of
> >> memory
> >> > > each
> >> > > > >> > > > > - 960 GB of SSD storage
> >> > > > >> > > > >
> >> > > > >> > > > > Would someone advise on what would be an interesting
> test
> >> > > setup?
> >> > > > >> > > > >
> >> > > > >> > > > > Thank you,
> >> > > > >> > > > > Edmon
> >> > > > >> > > > >
> >> > > > >> > > >
> >> > > > >> > >
> >> > > > >> >
> >> > > > >>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: Potential resource for large scale testing

Posted by Edmon Begoli <eb...@gmail.com>.
This request has been approved. I will get more details tomorrow.

I could add to the resource few members of the Drill team, maybe one person
from MapR and one from Dremio
who can have access and can assist in configuring (or instructing resource
sysadmins) how to run the big tests, if desired.
They will need to apply and get RSA tokens.

Then we can talk how to make this resource a part of the regular testing
and benchmarking process.

Thank you,
Edmon

On Fri, Sep 18, 2015 at 8:00 PM, Edmon Begoli <eb...@gmail.com> wrote:

> I requested 5000 hours a year on Beacon for Apache Drill for high
> performance benchmarking, testing and optimization.
> I will let you know of the resolution pretty soon. I expect these
> resources to be awarded to the project.
>
>
> On Fri, Sep 18, 2015 at 6:22 PM, Parth Chandra <pc...@maprtech.com>
> wrote:
>
>> +1 on running the build and tests.
>> If we need to run some kind of stress tests, we could consider running
>> TPC-H/TPC-DS at large scale factors.
>>
>> On Fri, Sep 18, 2015 at 2:24 PM, Jacques Nadeau <ja...@dremio.com>
>> wrote:
>>
>> > Not offhand. It really depends on how the time would work. For example,
>> it
>> > would be nice if we had an automated perfectly fressh (no .m2/repo)
>> nightly
>> > build and full test suite run so people can always check the status.
>> Maybe
>> > we use this hardware for that?
>> >
>> > --
>> > Jacques Nadeau
>> > CTO and Co-Founder, Dremio
>> >
>> > On Fri, Sep 18, 2015 at 9:48 AM, rahul challapalli <
>> > challapallirahul@gmail.com> wrote:
>> >
>> > > Edmon,
>> > >
>> > > We do have the tests available now [1].
>> > >
>> > > Jacques,
>> > >
>> > > You expressed interest in making these tests available on an Amazon
>> > cluster
>> > > so that users need not have physical hardware required to run these
>> > tests.
>> > > Do you have any specific thoughts on how to leverage the resources
>> that
>> > > Edmon is willing to contribute (performance testing?)
>> > >
>> > >
>> > > [1] https://github.com/mapr/drill-test-framework
>> > >
>> > > - Rahul
>> > >
>> > > On Thu, Sep 17, 2015 at 8:49 PM, Edmon Begoli <eb...@gmail.com>
>> wrote:
>> > >
>> > > > I discussed this idea of bringing large compute resource yesterday
>> with
>> > > my
>> > > > team at JICS to the project, and there was a general consensus that
>> it
>> > > can
>> > > > be committed.
>> > > >
>> > > > I will request and hopefully commit pretty large set of
>> > > > clustered CPU/storage resources for the needs of a Drill project.
>> > > >
>> > > > I will be the PI for the resource, and could give access to
>> whomever we
>> > > > want to designate from the Drill project side.
>> > > >
>> > > > Just let me know. I should have project approved within few days.
>> > > >
>> > > > Edmon
>> > > >
>> > > >
>> > > > On Saturday, September 5, 2015, Edmon Begoli <eb...@gmail.com>
>> > wrote:
>> > > >
>> > > > > Ted,
>> > > > >
>> > > > > It is actually very easy and painless to do what I am proposing. I
>> > > > > probably made it sound far more bureaucratic/legalistic than it
>> > really
>> > > > is.
>> > > > >
>> > > > > Researchers and projects from across the globe can apply for
>> cycles
>> > on
>> > > > > Beacon or any other HPC platform we run. (Beacon is by far the
>> best
>> > and
>> > > > we
>> > > > > already have a setup to run Spark and Hive on it. (We just
>> published
>> > > > paper
>> > > > > about it at XSEDE on integrating PBS/TORQUE scheduler with Spark
>> to
>> > run
>> > > > > JVM-bound jobs))
>> > > > >
>> > > > > As for use of resources, at the end of year we need to submit
>> reports
>> > > for
>> > > > > all the projects that used compute resources and how.
>> > > > > It is part of our mission, as being one of the XSEDE centers, to
>> > > > > help promote the advancement of the science and technology.
>> > > > > Reports from Principal Investigators (PI) show how we did it. In
>> this
>> > > > > case, I can be a PI and have any/someone from the Drill team
>> assigned
>> > > > > access.
>> > > > >
>> > > > > I don't think there are any IP issues. Open source project, open
>> > > research
>> > > > > institution, use of resources for testing and benchmarking. We
>> could
>> > > > > actually make JICS a benchmarking site for Drill (and even other
>> > Apache
>> > > > > projects).
>> > > > >
>> > > > > We'll discuss other details in a hangout. I am also planning to
>> brief
>> > > my
>> > > > > team next Wednesday on the plan for the use of resources.
>> > > > >
>> > > > > Regards,
>> > > > > Edmon
>> > > > >
>> > > > >
>> > > > > On Saturday, September 5, 2015, Ted Dunning <
>> ted.dunning@gmail.com
>> > > > > <javascript:_e(%7B%7D,'cvml','ted.dunning@gmail.com');>> wrote:
>> > > > >
>> > > > >> Edmon,
>> > > > >>
>> > > > >> This is very interesting.  I am sure that public
>> acknowledgements of
>> > > > >> contributions are easily managed.
>> > > > >>
>> > > > >> What might be even more useful for you would be small scale
>> > > > publications,
>> > > > >> especially about the problems of shoe-horning real-world data
>> > objects
>> > > > into
>> > > > >> the quasi-relational model of Drill.
>> > > > >>
>> > > > >> What would be problematic (and what is probably just a matter of
>> > > > >> nomenclature) is naming of an institution by the Apache specific
>> > term
>> > > > >> "committer" (you said commitment). Individuals at your
>> institution
>> > > would
>> > > > >> absolutely be up for being committers as they demonstrate a track
>> > > record
>> > > > >> of
>> > > > >> contribution.
>> > > > >>
>> > > > >> I would expect no need for any paperwork between JICS and Apache
>> > > unless
>> > > > >> you
>> > > > >> would like to execute a corporate contributor license to ensure
>> that
>> > > > >> particular individuals are specifically empowered to contribute
>> > code.
>> > > I
>> > > > >> don't know that the position of JICS is relative to intellectual
>> > > > property,
>> > > > >> though, so it might be worth checking out institutional policy on
>> > your
>> > > > >> side
>> > > > >> on how individuals can contribute to open source projects. It
>> > > shouldn't
>> > > > be
>> > > > >> too hard since there are quite a number of NSF funded people who
>> do
>> > > > >> contribute.
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >> On Fri, Sep 4, 2015 at 9:39 PM, Edmon Begoli <eb...@gmail.com>
>> > > wrote:
>> > > > >>
>> > > > >> > I can work with my institution and the NSF that we committ the
>> > time
>> > > on
>> > > > >> the
>> > > > >> > Beacon supercomputing cluster to Apache and the Drill project.
>> > Maybe
>> > > > 20
>> > > > >> > hours a month for 4-5 nodes.
>> > > > >> >
>> > > > >> > I have discretionary hours that I can put in, and I can, with
>> our
>> > > > >> > HPC admins, create deploy scripts on few clustered machines
>> (these
>> > > are
>> > > > >> all
>> > > > >> > very large boxes with 16 cores, 256 GB, 40gb IB interconnect,
>> and
>> > > > >> > with local 1 TB SSD each). There is also Medusa 10 PB
>> filesystem
>> > > > >> attached
>> > > > >> > but HDFS over local drives would probably be better.
>> > > > >> > They are otherwise just a regular machines, and run regular
>> JVMs
>> > on
>> > > > >> Linux.
>> > > > >> >
>> > > > >> > We can also get Rahul an access with a secure token to setup
>> > > > >> > and run stress/performance/integration tests for Drill. I can
>> > > actually
>> > > > >> help
>> > > > >> > there as well. This can be automated to run tests and collect
>> > > results.
>> > > > >> >
>> > > > >> > I think that the only requirement would be that the JICS team
>> be
>> > > named
>> > > > >> for
>> > > > >> > commitment because both NSF/XSEDE and UT like to see the
>> resources
>> > > > >> > being officially used and acknowledged. They are there to
>> support
>> > > open
>> > > > >> and
>> > > > >> > academic research; open source projects fit well.
>> > > > >> >
>> > > > >> > If this sounds OK with the project PMCs, I can start the
>> process
>> > of
>> > > > >> > allocation, accounts creation, setup.
>> > > > >> >
>> > > > >> > I would also, as a CDO, of JICS sign whatever standard papers
>> with
>> > > > >> > the Apache organization.
>> > > > >> >
>> > > > >> > With all this being said, let me know please if this is
>> something
>> > we
>> > > > >> want
>> > > > >> > to pursue.
>> > > > >> >
>> > > > >> > Thank you,
>> > > > >> > Edmon
>> > > > >> >
>> > > > >> > On Tuesday, September 1, 2015, Jacques Nadeau <
>> jacques@dremio.com
>> > >
>> > > > >> wrote:
>> > > > >> >
>> > > > >> > > I spent a bunch of time looking at the Phi coprocessors and
>> > forgot
>> > > > to
>> > > > >> get
>> > > > >> > > back to the thread. I'd love it if someone spent some time
>> > looking
>> > > > at
>> > > > >> > > leveraging them (since Drill is frequently processor bound).
>> > Any
>> > > > >> takers?
>> > > > >> > >
>> > > > >> > >
>> > > > >> > >
>> > > > >> > > --
>> > > > >> > > Jacques Nadeau
>> > > > >> > > CTO and Co-Founder, Dremio
>> > > > >> > >
>> > > > >> > > On Mon, Aug 31, 2015 at 10:24 PM, Parth Chandra <
>> > > parthc@apache.org
>> > > > >> > > <javascript:;>> wrote:
>> > > > >> > >
>> > > > >> > > > Hi Edmon,
>> > > > >> > > >   Sorry no one seems to have got back to you on this.
>> > > > >> > > >   We are in the process of publishing a test suite for
>> > > regression
>> > > > >> > testing
>> > > > >> > > > Drill and the cluster you have (even a few nodes ) would
>> be a
>> > > > great
>> > > > >> > > > resource for folks to run the test suite. Rahul, et al are
>> > > working
>> > > > >> on
>> > > > >> > > this
>> > > > >> > > > and I would suggest watching out for Rahul's posts on the
>> > topic.
>> > > > >> > > >
>> > > > >> > > > Parth
>> > > > >> > > >
>> > > > >> > > > On Tue, Aug 25, 2015 at 9:55 PM, Edmon Begoli <
>> > > ebegoli@gmail.com
>> > > > >> > > <javascript:;>> wrote:
>> > > > >> > > >
>> > > > >> > > > > Hey folks,
>> > > > >> > > > >
>> > > > >> > > > > As we discussed today on a hangout, this is a machine
>> that
>> > we
>> > > > >> have at
>> > > > >> > > > > JICS/NICS
>> > > > >> > > > > where I have Drill installed and where I could set up a
>> test
>> > > > >> cluster
>> > > > >> > > over
>> > > > >> > > > > few nodes.
>> > > > >> > > > >
>> > > > >> > > > >
>> > > > >> > >
>> > > > >>
>> > >
>> https://www.nics.tennessee.edu/computing-resources/beacon/configuration
>> > > > >> > > > >
>> > > > >> > > > > Note that each node is:
>> > > > >> > > > > - 2x8-core Intel® Xeon® E5-2670 processors
>> > > > >> > > > > - 256 GB of memory
>> > > > >> > > > > - 4 Intel® Xeon Phi™ coprocessors 5110P with 8 GB of
>> memory
>> > > each
>> > > > >> > > > > - 960 GB of SSD storage
>> > > > >> > > > >
>> > > > >> > > > > Would someone advise on what would be an interesting test
>> > > setup?
>> > > > >> > > > >
>> > > > >> > > > > Thank you,
>> > > > >> > > > > Edmon
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: Potential resource for large scale testing

Posted by Edmon Begoli <eb...@gmail.com>.
I requested 5000 hours a year on Beacon for Apache Drill for high
performance benchmarking, testing and optimization.
I will let you know of the resolution pretty soon. I expect these resources
to be awarded to the project.


On Fri, Sep 18, 2015 at 6:22 PM, Parth Chandra <pc...@maprtech.com>
wrote:

> +1 on running the build and tests.
> If we need to run some kind of stress tests, we could consider running
> TPC-H/TPC-DS at large scale factors.
>
> On Fri, Sep 18, 2015 at 2:24 PM, Jacques Nadeau <ja...@dremio.com>
> wrote:
>
> > Not offhand. It really depends on how the time would work. For example,
> it
> > would be nice if we had an automated perfectly fressh (no .m2/repo)
> nightly
> > build and full test suite run so people can always check the status.
> Maybe
> > we use this hardware for that?
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Fri, Sep 18, 2015 at 9:48 AM, rahul challapalli <
> > challapallirahul@gmail.com> wrote:
> >
> > > Edmon,
> > >
> > > We do have the tests available now [1].
> > >
> > > Jacques,
> > >
> > > You expressed interest in making these tests available on an Amazon
> > cluster
> > > so that users need not have physical hardware required to run these
> > tests.
> > > Do you have any specific thoughts on how to leverage the resources that
> > > Edmon is willing to contribute (performance testing?)
> > >
> > >
> > > [1] https://github.com/mapr/drill-test-framework
> > >
> > > - Rahul
> > >
> > > On Thu, Sep 17, 2015 at 8:49 PM, Edmon Begoli <eb...@gmail.com>
> wrote:
> > >
> > > > I discussed this idea of bringing large compute resource yesterday
> with
> > > my
> > > > team at JICS to the project, and there was a general consensus that
> it
> > > can
> > > > be committed.
> > > >
> > > > I will request and hopefully commit pretty large set of
> > > > clustered CPU/storage resources for the needs of a Drill project.
> > > >
> > > > I will be the PI for the resource, and could give access to whomever
> we
> > > > want to designate from the Drill project side.
> > > >
> > > > Just let me know. I should have project approved within few days.
> > > >
> > > > Edmon
> > > >
> > > >
> > > > On Saturday, September 5, 2015, Edmon Begoli <eb...@gmail.com>
> > wrote:
> > > >
> > > > > Ted,
> > > > >
> > > > > It is actually very easy and painless to do what I am proposing. I
> > > > > probably made it sound far more bureaucratic/legalistic than it
> > really
> > > > is.
> > > > >
> > > > > Researchers and projects from across the globe can apply for cycles
> > on
> > > > > Beacon or any other HPC platform we run. (Beacon is by far the best
> > and
> > > > we
> > > > > already have a setup to run Spark and Hive on it. (We just
> published
> > > > paper
> > > > > about it at XSEDE on integrating PBS/TORQUE scheduler with Spark to
> > run
> > > > > JVM-bound jobs))
> > > > >
> > > > > As for use of resources, at the end of year we need to submit
> reports
> > > for
> > > > > all the projects that used compute resources and how.
> > > > > It is part of our mission, as being one of the XSEDE centers, to
> > > > > help promote the advancement of the science and technology.
> > > > > Reports from Principal Investigators (PI) show how we did it. In
> this
> > > > > case, I can be a PI and have any/someone from the Drill team
> assigned
> > > > > access.
> > > > >
> > > > > I don't think there are any IP issues. Open source project, open
> > > research
> > > > > institution, use of resources for testing and benchmarking. We
> could
> > > > > actually make JICS a benchmarking site for Drill (and even other
> > Apache
> > > > > projects).
> > > > >
> > > > > We'll discuss other details in a hangout. I am also planning to
> brief
> > > my
> > > > > team next Wednesday on the plan for the use of resources.
> > > > >
> > > > > Regards,
> > > > > Edmon
> > > > >
> > > > >
> > > > > On Saturday, September 5, 2015, Ted Dunning <ted.dunning@gmail.com
> > > > > <javascript:_e(%7B%7D,'cvml','ted.dunning@gmail.com');>> wrote:
> > > > >
> > > > >> Edmon,
> > > > >>
> > > > >> This is very interesting.  I am sure that public acknowledgements
> of
> > > > >> contributions are easily managed.
> > > > >>
> > > > >> What might be even more useful for you would be small scale
> > > > publications,
> > > > >> especially about the problems of shoe-horning real-world data
> > objects
> > > > into
> > > > >> the quasi-relational model of Drill.
> > > > >>
> > > > >> What would be problematic (and what is probably just a matter of
> > > > >> nomenclature) is naming of an institution by the Apache specific
> > term
> > > > >> "committer" (you said commitment). Individuals at your institution
> > > would
> > > > >> absolutely be up for being committers as they demonstrate a track
> > > record
> > > > >> of
> > > > >> contribution.
> > > > >>
> > > > >> I would expect no need for any paperwork between JICS and Apache
> > > unless
> > > > >> you
> > > > >> would like to execute a corporate contributor license to ensure
> that
> > > > >> particular individuals are specifically empowered to contribute
> > code.
> > > I
> > > > >> don't know that the position of JICS is relative to intellectual
> > > > property,
> > > > >> though, so it might be worth checking out institutional policy on
> > your
> > > > >> side
> > > > >> on how individuals can contribute to open source projects. It
> > > shouldn't
> > > > be
> > > > >> too hard since there are quite a number of NSF funded people who
> do
> > > > >> contribute.
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Fri, Sep 4, 2015 at 9:39 PM, Edmon Begoli <eb...@gmail.com>
> > > wrote:
> > > > >>
> > > > >> > I can work with my institution and the NSF that we committ the
> > time
> > > on
> > > > >> the
> > > > >> > Beacon supercomputing cluster to Apache and the Drill project.
> > Maybe
> > > > 20
> > > > >> > hours a month for 4-5 nodes.
> > > > >> >
> > > > >> > I have discretionary hours that I can put in, and I can, with
> our
> > > > >> > HPC admins, create deploy scripts on few clustered machines
> (these
> > > are
> > > > >> all
> > > > >> > very large boxes with 16 cores, 256 GB, 40gb IB interconnect,
> and
> > > > >> > with local 1 TB SSD each). There is also Medusa 10 PB filesystem
> > > > >> attached
> > > > >> > but HDFS over local drives would probably be better.
> > > > >> > They are otherwise just a regular machines, and run regular JVMs
> > on
> > > > >> Linux.
> > > > >> >
> > > > >> > We can also get Rahul an access with a secure token to setup
> > > > >> > and run stress/performance/integration tests for Drill. I can
> > > actually
> > > > >> help
> > > > >> > there as well. This can be automated to run tests and collect
> > > results.
> > > > >> >
> > > > >> > I think that the only requirement would be that the JICS team be
> > > named
> > > > >> for
> > > > >> > commitment because both NSF/XSEDE and UT like to see the
> resources
> > > > >> > being officially used and acknowledged. They are there to
> support
> > > open
> > > > >> and
> > > > >> > academic research; open source projects fit well.
> > > > >> >
> > > > >> > If this sounds OK with the project PMCs, I can start the process
> > of
> > > > >> > allocation, accounts creation, setup.
> > > > >> >
> > > > >> > I would also, as a CDO, of JICS sign whatever standard papers
> with
> > > > >> > the Apache organization.
> > > > >> >
> > > > >> > With all this being said, let me know please if this is
> something
> > we
> > > > >> want
> > > > >> > to pursue.
> > > > >> >
> > > > >> > Thank you,
> > > > >> > Edmon
> > > > >> >
> > > > >> > On Tuesday, September 1, 2015, Jacques Nadeau <
> jacques@dremio.com
> > >
> > > > >> wrote:
> > > > >> >
> > > > >> > > I spent a bunch of time looking at the Phi coprocessors and
> > forgot
> > > > to
> > > > >> get
> > > > >> > > back to the thread. I'd love it if someone spent some time
> > looking
> > > > at
> > > > >> > > leveraging them (since Drill is frequently processor bound).
> > Any
> > > > >> takers?
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > > --
> > > > >> > > Jacques Nadeau
> > > > >> > > CTO and Co-Founder, Dremio
> > > > >> > >
> > > > >> > > On Mon, Aug 31, 2015 at 10:24 PM, Parth Chandra <
> > > parthc@apache.org
> > > > >> > > <javascript:;>> wrote:
> > > > >> > >
> > > > >> > > > Hi Edmon,
> > > > >> > > >   Sorry no one seems to have got back to you on this.
> > > > >> > > >   We are in the process of publishing a test suite for
> > > regression
> > > > >> > testing
> > > > >> > > > Drill and the cluster you have (even a few nodes ) would be
> a
> > > > great
> > > > >> > > > resource for folks to run the test suite. Rahul, et al are
> > > working
> > > > >> on
> > > > >> > > this
> > > > >> > > > and I would suggest watching out for Rahul's posts on the
> > topic.
> > > > >> > > >
> > > > >> > > > Parth
> > > > >> > > >
> > > > >> > > > On Tue, Aug 25, 2015 at 9:55 PM, Edmon Begoli <
> > > ebegoli@gmail.com
> > > > >> > > <javascript:;>> wrote:
> > > > >> > > >
> > > > >> > > > > Hey folks,
> > > > >> > > > >
> > > > >> > > > > As we discussed today on a hangout, this is a machine that
> > we
> > > > >> have at
> > > > >> > > > > JICS/NICS
> > > > >> > > > > where I have Drill installed and where I could set up a
> test
> > > > >> cluster
> > > > >> > > over
> > > > >> > > > > few nodes.
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > >
> > > > >>
> > >
> https://www.nics.tennessee.edu/computing-resources/beacon/configuration
> > > > >> > > > >
> > > > >> > > > > Note that each node is:
> > > > >> > > > > - 2x8-core Intel® Xeon® E5-2670 processors
> > > > >> > > > > - 256 GB of memory
> > > > >> > > > > - 4 Intel® Xeon Phi™ coprocessors 5110P with 8 GB of
> memory
> > > each
> > > > >> > > > > - 960 GB of SSD storage
> > > > >> > > > >
> > > > >> > > > > Would someone advise on what would be an interesting test
> > > setup?
> > > > >> > > > >
> > > > >> > > > > Thank you,
> > > > >> > > > > Edmon
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
>

Re: Potential resource for large scale testing

Posted by Parth Chandra <pc...@maprtech.com>.
+1 on running the build and tests.
If we need to run some kind of stress tests, we could consider running
TPC-H/TPC-DS at large scale factors.

On Fri, Sep 18, 2015 at 2:24 PM, Jacques Nadeau <ja...@dremio.com> wrote:

> Not offhand. It really depends on how the time would work. For example, it
> would be nice if we had an automated perfectly fressh (no .m2/repo) nightly
> build and full test suite run so people can always check the status. Maybe
> we use this hardware for that?
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Fri, Sep 18, 2015 at 9:48 AM, rahul challapalli <
> challapallirahul@gmail.com> wrote:
>
> > Edmon,
> >
> > We do have the tests available now [1].
> >
> > Jacques,
> >
> > You expressed interest in making these tests available on an Amazon
> cluster
> > so that users need not have physical hardware required to run these
> tests.
> > Do you have any specific thoughts on how to leverage the resources that
> > Edmon is willing to contribute (performance testing?)
> >
> >
> > [1] https://github.com/mapr/drill-test-framework
> >
> > - Rahul
> >
> > On Thu, Sep 17, 2015 at 8:49 PM, Edmon Begoli <eb...@gmail.com> wrote:
> >
> > > I discussed this idea of bringing large compute resource yesterday with
> > my
> > > team at JICS to the project, and there was a general consensus that it
> > can
> > > be committed.
> > >
> > > I will request and hopefully commit pretty large set of
> > > clustered CPU/storage resources for the needs of a Drill project.
> > >
> > > I will be the PI for the resource, and could give access to whomever we
> > > want to designate from the Drill project side.
> > >
> > > Just let me know. I should have project approved within few days.
> > >
> > > Edmon
> > >
> > >
> > > On Saturday, September 5, 2015, Edmon Begoli <eb...@gmail.com>
> wrote:
> > >
> > > > Ted,
> > > >
> > > > It is actually very easy and painless to do what I am proposing. I
> > > > probably made it sound far more bureaucratic/legalistic than it
> really
> > > is.
> > > >
> > > > Researchers and projects from across the globe can apply for cycles
> on
> > > > Beacon or any other HPC platform we run. (Beacon is by far the best
> and
> > > we
> > > > already have a setup to run Spark and Hive on it. (We just published
> > > paper
> > > > about it at XSEDE on integrating PBS/TORQUE scheduler with Spark to
> run
> > > > JVM-bound jobs))
> > > >
> > > > As for use of resources, at the end of year we need to submit reports
> > for
> > > > all the projects that used compute resources and how.
> > > > It is part of our mission, as being one of the XSEDE centers, to
> > > > help promote the advancement of the science and technology.
> > > > Reports from Principal Investigators (PI) show how we did it. In this
> > > > case, I can be a PI and have any/someone from the Drill team assigned
> > > > access.
> > > >
> > > > I don't think there are any IP issues. Open source project, open
> > research
> > > > institution, use of resources for testing and benchmarking. We could
> > > > actually make JICS a benchmarking site for Drill (and even other
> Apache
> > > > projects).
> > > >
> > > > We'll discuss other details in a hangout. I am also planning to brief
> > my
> > > > team next Wednesday on the plan for the use of resources.
> > > >
> > > > Regards,
> > > > Edmon
> > > >
> > > >
> > > > On Saturday, September 5, 2015, Ted Dunning <ted.dunning@gmail.com
> > > > <javascript:_e(%7B%7D,'cvml','ted.dunning@gmail.com');>> wrote:
> > > >
> > > >> Edmon,
> > > >>
> > > >> This is very interesting.  I am sure that public acknowledgements of
> > > >> contributions are easily managed.
> > > >>
> > > >> What might be even more useful for you would be small scale
> > > publications,
> > > >> especially about the problems of shoe-horning real-world data
> objects
> > > into
> > > >> the quasi-relational model of Drill.
> > > >>
> > > >> What would be problematic (and what is probably just a matter of
> > > >> nomenclature) is naming of an institution by the Apache specific
> term
> > > >> "committer" (you said commitment). Individuals at your institution
> > would
> > > >> absolutely be up for being committers as they demonstrate a track
> > record
> > > >> of
> > > >> contribution.
> > > >>
> > > >> I would expect no need for any paperwork between JICS and Apache
> > unless
> > > >> you
> > > >> would like to execute a corporate contributor license to ensure that
> > > >> particular individuals are specifically empowered to contribute
> code.
> > I
> > > >> don't know that the position of JICS is relative to intellectual
> > > property,
> > > >> though, so it might be worth checking out institutional policy on
> your
> > > >> side
> > > >> on how individuals can contribute to open source projects. It
> > shouldn't
> > > be
> > > >> too hard since there are quite a number of NSF funded people who do
> > > >> contribute.
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Fri, Sep 4, 2015 at 9:39 PM, Edmon Begoli <eb...@gmail.com>
> > wrote:
> > > >>
> > > >> > I can work with my institution and the NSF that we committ the
> time
> > on
> > > >> the
> > > >> > Beacon supercomputing cluster to Apache and the Drill project.
> Maybe
> > > 20
> > > >> > hours a month for 4-5 nodes.
> > > >> >
> > > >> > I have discretionary hours that I can put in, and I can, with our
> > > >> > HPC admins, create deploy scripts on few clustered machines (these
> > are
> > > >> all
> > > >> > very large boxes with 16 cores, 256 GB, 40gb IB interconnect, and
> > > >> > with local 1 TB SSD each). There is also Medusa 10 PB filesystem
> > > >> attached
> > > >> > but HDFS over local drives would probably be better.
> > > >> > They are otherwise just a regular machines, and run regular JVMs
> on
> > > >> Linux.
> > > >> >
> > > >> > We can also get Rahul an access with a secure token to setup
> > > >> > and run stress/performance/integration tests for Drill. I can
> > actually
> > > >> help
> > > >> > there as well. This can be automated to run tests and collect
> > results.
> > > >> >
> > > >> > I think that the only requirement would be that the JICS team be
> > named
> > > >> for
> > > >> > commitment because both NSF/XSEDE and UT like to see the resources
> > > >> > being officially used and acknowledged. They are there to support
> > open
> > > >> and
> > > >> > academic research; open source projects fit well.
> > > >> >
> > > >> > If this sounds OK with the project PMCs, I can start the process
> of
> > > >> > allocation, accounts creation, setup.
> > > >> >
> > > >> > I would also, as a CDO, of JICS sign whatever standard papers with
> > > >> > the Apache organization.
> > > >> >
> > > >> > With all this being said, let me know please if this is something
> we
> > > >> want
> > > >> > to pursue.
> > > >> >
> > > >> > Thank you,
> > > >> > Edmon
> > > >> >
> > > >> > On Tuesday, September 1, 2015, Jacques Nadeau <jacques@dremio.com
> >
> > > >> wrote:
> > > >> >
> > > >> > > I spent a bunch of time looking at the Phi coprocessors and
> forgot
> > > to
> > > >> get
> > > >> > > back to the thread. I'd love it if someone spent some time
> looking
> > > at
> > > >> > > leveraging them (since Drill is frequently processor bound).
> Any
> > > >> takers?
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > --
> > > >> > > Jacques Nadeau
> > > >> > > CTO and Co-Founder, Dremio
> > > >> > >
> > > >> > > On Mon, Aug 31, 2015 at 10:24 PM, Parth Chandra <
> > parthc@apache.org
> > > >> > > <javascript:;>> wrote:
> > > >> > >
> > > >> > > > Hi Edmon,
> > > >> > > >   Sorry no one seems to have got back to you on this.
> > > >> > > >   We are in the process of publishing a test suite for
> > regression
> > > >> > testing
> > > >> > > > Drill and the cluster you have (even a few nodes ) would be a
> > > great
> > > >> > > > resource for folks to run the test suite. Rahul, et al are
> > working
> > > >> on
> > > >> > > this
> > > >> > > > and I would suggest watching out for Rahul's posts on the
> topic.
> > > >> > > >
> > > >> > > > Parth
> > > >> > > >
> > > >> > > > On Tue, Aug 25, 2015 at 9:55 PM, Edmon Begoli <
> > ebegoli@gmail.com
> > > >> > > <javascript:;>> wrote:
> > > >> > > >
> > > >> > > > > Hey folks,
> > > >> > > > >
> > > >> > > > > As we discussed today on a hangout, this is a machine that
> we
> > > >> have at
> > > >> > > > > JICS/NICS
> > > >> > > > > where I have Drill installed and where I could set up a test
> > > >> cluster
> > > >> > > over
> > > >> > > > > few nodes.
> > > >> > > > >
> > > >> > > > >
> > > >> > >
> > > >>
> > https://www.nics.tennessee.edu/computing-resources/beacon/configuration
> > > >> > > > >
> > > >> > > > > Note that each node is:
> > > >> > > > > - 2x8-core Intel® Xeon® E5-2670 processors
> > > >> > > > > - 256 GB of memory
> > > >> > > > > - 4 Intel® Xeon Phi™ coprocessors 5110P with 8 GB of memory
> > each
> > > >> > > > > - 960 GB of SSD storage
> > > >> > > > >
> > > >> > > > > Would someone advise on what would be an interesting test
> > setup?
> > > >> > > > >
> > > >> > > > > Thank you,
> > > >> > > > > Edmon
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: Potential resource for large scale testing

Posted by Jacques Nadeau <ja...@dremio.com>.
Not offhand. It really depends on how the time would work. For example, it
would be nice if we had an automated perfectly fressh (no .m2/repo) nightly
build and full test suite run so people can always check the status. Maybe
we use this hardware for that?

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Fri, Sep 18, 2015 at 9:48 AM, rahul challapalli <
challapallirahul@gmail.com> wrote:

> Edmon,
>
> We do have the tests available now [1].
>
> Jacques,
>
> You expressed interest in making these tests available on an Amazon cluster
> so that users need not have physical hardware required to run these tests.
> Do you have any specific thoughts on how to leverage the resources that
> Edmon is willing to contribute (performance testing?)
>
>
> [1] https://github.com/mapr/drill-test-framework
>
> - Rahul
>
> On Thu, Sep 17, 2015 at 8:49 PM, Edmon Begoli <eb...@gmail.com> wrote:
>
> > I discussed this idea of bringing large compute resource yesterday with
> my
> > team at JICS to the project, and there was a general consensus that it
> can
> > be committed.
> >
> > I will request and hopefully commit pretty large set of
> > clustered CPU/storage resources for the needs of a Drill project.
> >
> > I will be the PI for the resource, and could give access to whomever we
> > want to designate from the Drill project side.
> >
> > Just let me know. I should have project approved within few days.
> >
> > Edmon
> >
> >
> > On Saturday, September 5, 2015, Edmon Begoli <eb...@gmail.com> wrote:
> >
> > > Ted,
> > >
> > > It is actually very easy and painless to do what I am proposing. I
> > > probably made it sound far more bureaucratic/legalistic than it really
> > is.
> > >
> > > Researchers and projects from across the globe can apply for cycles on
> > > Beacon or any other HPC platform we run. (Beacon is by far the best and
> > we
> > > already have a setup to run Spark and Hive on it. (We just published
> > paper
> > > about it at XSEDE on integrating PBS/TORQUE scheduler with Spark to run
> > > JVM-bound jobs))
> > >
> > > As for use of resources, at the end of year we need to submit reports
> for
> > > all the projects that used compute resources and how.
> > > It is part of our mission, as being one of the XSEDE centers, to
> > > help promote the advancement of the science and technology.
> > > Reports from Principal Investigators (PI) show how we did it. In this
> > > case, I can be a PI and have any/someone from the Drill team assigned
> > > access.
> > >
> > > I don't think there are any IP issues. Open source project, open
> research
> > > institution, use of resources for testing and benchmarking. We could
> > > actually make JICS a benchmarking site for Drill (and even other Apache
> > > projects).
> > >
> > > We'll discuss other details in a hangout. I am also planning to brief
> my
> > > team next Wednesday on the plan for the use of resources.
> > >
> > > Regards,
> > > Edmon
> > >
> > >
> > > On Saturday, September 5, 2015, Ted Dunning <ted.dunning@gmail.com
> > > <javascript:_e(%7B%7D,'cvml','ted.dunning@gmail.com');>> wrote:
> > >
> > >> Edmon,
> > >>
> > >> This is very interesting.  I am sure that public acknowledgements of
> > >> contributions are easily managed.
> > >>
> > >> What might be even more useful for you would be small scale
> > publications,
> > >> especially about the problems of shoe-horning real-world data objects
> > into
> > >> the quasi-relational model of Drill.
> > >>
> > >> What would be problematic (and what is probably just a matter of
> > >> nomenclature) is naming of an institution by the Apache specific term
> > >> "committer" (you said commitment). Individuals at your institution
> would
> > >> absolutely be up for being committers as they demonstrate a track
> record
> > >> of
> > >> contribution.
> > >>
> > >> I would expect no need for any paperwork between JICS and Apache
> unless
> > >> you
> > >> would like to execute a corporate contributor license to ensure that
> > >> particular individuals are specifically empowered to contribute code.
> I
> > >> don't know that the position of JICS is relative to intellectual
> > property,
> > >> though, so it might be worth checking out institutional policy on your
> > >> side
> > >> on how individuals can contribute to open source projects. It
> shouldn't
> > be
> > >> too hard since there are quite a number of NSF funded people who do
> > >> contribute.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Fri, Sep 4, 2015 at 9:39 PM, Edmon Begoli <eb...@gmail.com>
> wrote:
> > >>
> > >> > I can work with my institution and the NSF that we committ the time
> on
> > >> the
> > >> > Beacon supercomputing cluster to Apache and the Drill project. Maybe
> > 20
> > >> > hours a month for 4-5 nodes.
> > >> >
> > >> > I have discretionary hours that I can put in, and I can, with our
> > >> > HPC admins, create deploy scripts on few clustered machines (these
> are
> > >> all
> > >> > very large boxes with 16 cores, 256 GB, 40gb IB interconnect, and
> > >> > with local 1 TB SSD each). There is also Medusa 10 PB filesystem
> > >> attached
> > >> > but HDFS over local drives would probably be better.
> > >> > They are otherwise just a regular machines, and run regular JVMs on
> > >> Linux.
> > >> >
> > >> > We can also get Rahul an access with a secure token to setup
> > >> > and run stress/performance/integration tests for Drill. I can
> actually
> > >> help
> > >> > there as well. This can be automated to run tests and collect
> results.
> > >> >
> > >> > I think that the only requirement would be that the JICS team be
> named
> > >> for
> > >> > commitment because both NSF/XSEDE and UT like to see the resources
> > >> > being officially used and acknowledged. They are there to support
> open
> > >> and
> > >> > academic research; open source projects fit well.
> > >> >
> > >> > If this sounds OK with the project PMCs, I can start the process of
> > >> > allocation, accounts creation, setup.
> > >> >
> > >> > I would also, as a CDO, of JICS sign whatever standard papers with
> > >> > the Apache organization.
> > >> >
> > >> > With all this being said, let me know please if this is something we
> > >> want
> > >> > to pursue.
> > >> >
> > >> > Thank you,
> > >> > Edmon
> > >> >
> > >> > On Tuesday, September 1, 2015, Jacques Nadeau <ja...@dremio.com>
> > >> wrote:
> > >> >
> > >> > > I spent a bunch of time looking at the Phi coprocessors and forgot
> > to
> > >> get
> > >> > > back to the thread. I'd love it if someone spent some time looking
> > at
> > >> > > leveraging them (since Drill is frequently processor bound).  Any
> > >> takers?
> > >> > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Jacques Nadeau
> > >> > > CTO and Co-Founder, Dremio
> > >> > >
> > >> > > On Mon, Aug 31, 2015 at 10:24 PM, Parth Chandra <
> parthc@apache.org
> > >> > > <javascript:;>> wrote:
> > >> > >
> > >> > > > Hi Edmon,
> > >> > > >   Sorry no one seems to have got back to you on this.
> > >> > > >   We are in the process of publishing a test suite for
> regression
> > >> > testing
> > >> > > > Drill and the cluster you have (even a few nodes ) would be a
> > great
> > >> > > > resource for folks to run the test suite. Rahul, et al are
> working
> > >> on
> > >> > > this
> > >> > > > and I would suggest watching out for Rahul's posts on the topic.
> > >> > > >
> > >> > > > Parth
> > >> > > >
> > >> > > > On Tue, Aug 25, 2015 at 9:55 PM, Edmon Begoli <
> ebegoli@gmail.com
> > >> > > <javascript:;>> wrote:
> > >> > > >
> > >> > > > > Hey folks,
> > >> > > > >
> > >> > > > > As we discussed today on a hangout, this is a machine that we
> > >> have at
> > >> > > > > JICS/NICS
> > >> > > > > where I have Drill installed and where I could set up a test
> > >> cluster
> > >> > > over
> > >> > > > > few nodes.
> > >> > > > >
> > >> > > > >
> > >> > >
> > >>
> https://www.nics.tennessee.edu/computing-resources/beacon/configuration
> > >> > > > >
> > >> > > > > Note that each node is:
> > >> > > > > - 2x8-core Intel® Xeon® E5-2670 processors
> > >> > > > > - 256 GB of memory
> > >> > > > > - 4 Intel® Xeon Phi™ coprocessors 5110P with 8 GB of memory
> each
> > >> > > > > - 960 GB of SSD storage
> > >> > > > >
> > >> > > > > Would someone advise on what would be an interesting test
> setup?
> > >> > > > >
> > >> > > > > Thank you,
> > >> > > > > Edmon
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>

Re: Potential resource for large scale testing

Posted by Edmon Begoli <eb...@gmail.com>.
We could use JICS/NICS resources to run memory stress tests - jobs
requiring high RAM. Also, cluster stress tests.

That is expensive to do on AWS. Plus we can provide some sys admin support.

On Friday, September 18, 2015, rahul challapalli <ch...@gmail.com>
wrote:

> Edmon,
>
> We do have the tests available now [1].
>
> Jacques,
>
> You expressed interest in making these tests available on an Amazon cluster
> so that users need not have physical hardware required to run these tests.
> Do you have any specific thoughts on how to leverage the resources that
> Edmon is willing to contribute (performance testing?)
>
>
> [1] https://github.com/mapr/drill-test-framework
>
> - Rahul
>
> On Thu, Sep 17, 2015 at 8:49 PM, Edmon Begoli <ebegoli@gmail.com
> <javascript:;>> wrote:
>
> > I discussed this idea of bringing large compute resource yesterday with
> my
> > team at JICS to the project, and there was a general consensus that it
> can
> > be committed.
> >
> > I will request and hopefully commit pretty large set of
> > clustered CPU/storage resources for the needs of a Drill project.
> >
> > I will be the PI for the resource, and could give access to whomever we
> > want to designate from the Drill project side.
> >
> > Just let me know. I should have project approved within few days.
> >
> > Edmon
> >
> >
> > On Saturday, September 5, 2015, Edmon Begoli <ebegoli@gmail.com
> <javascript:;>> wrote:
> >
> > > Ted,
> > >
> > > It is actually very easy and painless to do what I am proposing. I
> > > probably made it sound far more bureaucratic/legalistic than it really
> > is.
> > >
> > > Researchers and projects from across the globe can apply for cycles on
> > > Beacon or any other HPC platform we run. (Beacon is by far the best and
> > we
> > > already have a setup to run Spark and Hive on it. (We just published
> > paper
> > > about it at XSEDE on integrating PBS/TORQUE scheduler with Spark to run
> > > JVM-bound jobs))
> > >
> > > As for use of resources, at the end of year we need to submit reports
> for
> > > all the projects that used compute resources and how.
> > > It is part of our mission, as being one of the XSEDE centers, to
> > > help promote the advancement of the science and technology.
> > > Reports from Principal Investigators (PI) show how we did it. In this
> > > case, I can be a PI and have any/someone from the Drill team assigned
> > > access.
> > >
> > > I don't think there are any IP issues. Open source project, open
> research
> > > institution, use of resources for testing and benchmarking. We could
> > > actually make JICS a benchmarking site for Drill (and even other Apache
> > > projects).
> > >
> > > We'll discuss other details in a hangout. I am also planning to brief
> my
> > > team next Wednesday on the plan for the use of resources.
> > >
> > > Regards,
> > > Edmon
> > >
> > >
> > > On Saturday, September 5, 2015, Ted Dunning <ted.dunning@gmail.com
> <javascript:;>
> > > <javascript:_e(%7B%7D,'cvml','ted.dunning@gmail.com <javascript:;>');>>
> wrote:
> > >
> > >> Edmon,
> > >>
> > >> This is very interesting.  I am sure that public acknowledgements of
> > >> contributions are easily managed.
> > >>
> > >> What might be even more useful for you would be small scale
> > publications,
> > >> especially about the problems of shoe-horning real-world data objects
> > into
> > >> the quasi-relational model of Drill.
> > >>
> > >> What would be problematic (and what is probably just a matter of
> > >> nomenclature) is naming of an institution by the Apache specific term
> > >> "committer" (you said commitment). Individuals at your institution
> would
> > >> absolutely be up for being committers as they demonstrate a track
> record
> > >> of
> > >> contribution.
> > >>
> > >> I would expect no need for any paperwork between JICS and Apache
> unless
> > >> you
> > >> would like to execute a corporate contributor license to ensure that
> > >> particular individuals are specifically empowered to contribute code.
> I
> > >> don't know that the position of JICS is relative to intellectual
> > property,
> > >> though, so it might be worth checking out institutional policy on your
> > >> side
> > >> on how individuals can contribute to open source projects. It
> shouldn't
> > be
> > >> too hard since there are quite a number of NSF funded people who do
> > >> contribute.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Fri, Sep 4, 2015 at 9:39 PM, Edmon Begoli <ebegoli@gmail.com
> <javascript:;>> wrote:
> > >>
> > >> > I can work with my institution and the NSF that we committ the time
> on
> > >> the
> > >> > Beacon supercomputing cluster to Apache and the Drill project. Maybe
> > 20
> > >> > hours a month for 4-5 nodes.
> > >> >
> > >> > I have discretionary hours that I can put in, and I can, with our
> > >> > HPC admins, create deploy scripts on few clustered machines (these
> are
> > >> all
> > >> > very large boxes with 16 cores, 256 GB, 40gb IB interconnect, and
> > >> > with local 1 TB SSD each). There is also Medusa 10 PB filesystem
> > >> attached
> > >> > but HDFS over local drives would probably be better.
> > >> > They are otherwise just a regular machines, and run regular JVMs on
> > >> Linux.
> > >> >
> > >> > We can also get Rahul an access with a secure token to setup
> > >> > and run stress/performance/integration tests for Drill. I can
> actually
> > >> help
> > >> > there as well. This can be automated to run tests and collect
> results.
> > >> >
> > >> > I think that the only requirement would be that the JICS team be
> named
> > >> for
> > >> > commitment because both NSF/XSEDE and UT like to see the resources
> > >> > being officially used and acknowledged. They are there to support
> open
> > >> and
> > >> > academic research; open source projects fit well.
> > >> >
> > >> > If this sounds OK with the project PMCs, I can start the process of
> > >> > allocation, accounts creation, setup.
> > >> >
> > >> > I would also, as a CDO, of JICS sign whatever standard papers with
> > >> > the Apache organization.
> > >> >
> > >> > With all this being said, let me know please if this is something we
> > >> want
> > >> > to pursue.
> > >> >
> > >> > Thank you,
> > >> > Edmon
> > >> >
> > >> > On Tuesday, September 1, 2015, Jacques Nadeau <jacques@dremio.com
> <javascript:;>>
> > >> wrote:
> > >> >
> > >> > > I spent a bunch of time looking at the Phi coprocessors and forgot
> > to
> > >> get
> > >> > > back to the thread. I'd love it if someone spent some time looking
> > at
> > >> > > leveraging them (since Drill is frequently processor bound).  Any
> > >> takers?
> > >> > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Jacques Nadeau
> > >> > > CTO and Co-Founder, Dremio
> > >> > >
> > >> > > On Mon, Aug 31, 2015 at 10:24 PM, Parth Chandra <
> parthc@apache.org <javascript:;>
> > >> > > <javascript:;>> wrote:
> > >> > >
> > >> > > > Hi Edmon,
> > >> > > >   Sorry no one seems to have got back to you on this.
> > >> > > >   We are in the process of publishing a test suite for
> regression
> > >> > testing
> > >> > > > Drill and the cluster you have (even a few nodes ) would be a
> > great
> > >> > > > resource for folks to run the test suite. Rahul, et al are
> working
> > >> on
> > >> > > this
> > >> > > > and I would suggest watching out for Rahul's posts on the topic.
> > >> > > >
> > >> > > > Parth
> > >> > > >
> > >> > > > On Tue, Aug 25, 2015 at 9:55 PM, Edmon Begoli <
> ebegoli@gmail.com <javascript:;>
> > >> > > <javascript:;>> wrote:
> > >> > > >
> > >> > > > > Hey folks,
> > >> > > > >
> > >> > > > > As we discussed today on a hangout, this is a machine that we
> > >> have at
> > >> > > > > JICS/NICS
> > >> > > > > where I have Drill installed and where I could set up a test
> > >> cluster
> > >> > > over
> > >> > > > > few nodes.
> > >> > > > >
> > >> > > > >
> > >> > >
> > >>
> https://www.nics.tennessee.edu/computing-resources/beacon/configuration
> > >> > > > >
> > >> > > > > Note that each node is:
> > >> > > > > - 2x8-core Intel® Xeon® E5-2670 processors
> > >> > > > > - 256 GB of memory
> > >> > > > > - 4 Intel® Xeon Phi™ coprocessors 5110P with 8 GB of memory
> each
> > >> > > > > - 960 GB of SSD storage
> > >> > > > >
> > >> > > > > Would someone advise on what would be an interesting test
> setup?
> > >> > > > >
> > >> > > > > Thank you,
> > >> > > > > Edmon
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>

Re: Potential resource for large scale testing

Posted by rahul challapalli <ch...@gmail.com>.
Edmon,

We do have the tests available now [1].

Jacques,

You expressed interest in making these tests available on an Amazon cluster
so that users need not have physical hardware required to run these tests.
Do you have any specific thoughts on how to leverage the resources that
Edmon is willing to contribute (performance testing?)


[1] https://github.com/mapr/drill-test-framework

- Rahul

On Thu, Sep 17, 2015 at 8:49 PM, Edmon Begoli <eb...@gmail.com> wrote:

> I discussed this idea of bringing large compute resource yesterday with my
> team at JICS to the project, and there was a general consensus that it can
> be committed.
>
> I will request and hopefully commit pretty large set of
> clustered CPU/storage resources for the needs of a Drill project.
>
> I will be the PI for the resource, and could give access to whomever we
> want to designate from the Drill project side.
>
> Just let me know. I should have project approved within few days.
>
> Edmon
>
>
> On Saturday, September 5, 2015, Edmon Begoli <eb...@gmail.com> wrote:
>
> > Ted,
> >
> > It is actually very easy and painless to do what I am proposing. I
> > probably made it sound far more bureaucratic/legalistic than it really
> is.
> >
> > Researchers and projects from across the globe can apply for cycles on
> > Beacon or any other HPC platform we run. (Beacon is by far the best and
> we
> > already have a setup to run Spark and Hive on it. (We just published
> paper
> > about it at XSEDE on integrating PBS/TORQUE scheduler with Spark to run
> > JVM-bound jobs))
> >
> > As for use of resources, at the end of year we need to submit reports for
> > all the projects that used compute resources and how.
> > It is part of our mission, as being one of the XSEDE centers, to
> > help promote the advancement of the science and technology.
> > Reports from Principal Investigators (PI) show how we did it. In this
> > case, I can be a PI and have any/someone from the Drill team assigned
> > access.
> >
> > I don't think there are any IP issues. Open source project, open research
> > institution, use of resources for testing and benchmarking. We could
> > actually make JICS a benchmarking site for Drill (and even other Apache
> > projects).
> >
> > We'll discuss other details in a hangout. I am also planning to brief my
> > team next Wednesday on the plan for the use of resources.
> >
> > Regards,
> > Edmon
> >
> >
> > On Saturday, September 5, 2015, Ted Dunning <ted.dunning@gmail.com
> > <javascript:_e(%7B%7D,'cvml','ted.dunning@gmail.com');>> wrote:
> >
> >> Edmon,
> >>
> >> This is very interesting.  I am sure that public acknowledgements of
> >> contributions are easily managed.
> >>
> >> What might be even more useful for you would be small scale
> publications,
> >> especially about the problems of shoe-horning real-world data objects
> into
> >> the quasi-relational model of Drill.
> >>
> >> What would be problematic (and what is probably just a matter of
> >> nomenclature) is naming of an institution by the Apache specific term
> >> "committer" (you said commitment). Individuals at your institution would
> >> absolutely be up for being committers as they demonstrate a track record
> >> of
> >> contribution.
> >>
> >> I would expect no need for any paperwork between JICS and Apache unless
> >> you
> >> would like to execute a corporate contributor license to ensure that
> >> particular individuals are specifically empowered to contribute code. I
> >> don't know that the position of JICS is relative to intellectual
> property,
> >> though, so it might be worth checking out institutional policy on your
> >> side
> >> on how individuals can contribute to open source projects. It shouldn't
> be
> >> too hard since there are quite a number of NSF funded people who do
> >> contribute.
> >>
> >>
> >>
> >>
> >>
> >> On Fri, Sep 4, 2015 at 9:39 PM, Edmon Begoli <eb...@gmail.com> wrote:
> >>
> >> > I can work with my institution and the NSF that we committ the time on
> >> the
> >> > Beacon supercomputing cluster to Apache and the Drill project. Maybe
> 20
> >> > hours a month for 4-5 nodes.
> >> >
> >> > I have discretionary hours that I can put in, and I can, with our
> >> > HPC admins, create deploy scripts on few clustered machines (these are
> >> all
> >> > very large boxes with 16 cores, 256 GB, 40gb IB interconnect, and
> >> > with local 1 TB SSD each). There is also Medusa 10 PB filesystem
> >> attached
> >> > but HDFS over local drives would probably be better.
> >> > They are otherwise just a regular machines, and run regular JVMs on
> >> Linux.
> >> >
> >> > We can also get Rahul an access with a secure token to setup
> >> > and run stress/performance/integration tests for Drill. I can actually
> >> help
> >> > there as well. This can be automated to run tests and collect results.
> >> >
> >> > I think that the only requirement would be that the JICS team be named
> >> for
> >> > commitment because both NSF/XSEDE and UT like to see the resources
> >> > being officially used and acknowledged. They are there to support open
> >> and
> >> > academic research; open source projects fit well.
> >> >
> >> > If this sounds OK with the project PMCs, I can start the process of
> >> > allocation, accounts creation, setup.
> >> >
> >> > I would also, as a CDO, of JICS sign whatever standard papers with
> >> > the Apache organization.
> >> >
> >> > With all this being said, let me know please if this is something we
> >> want
> >> > to pursue.
> >> >
> >> > Thank you,
> >> > Edmon
> >> >
> >> > On Tuesday, September 1, 2015, Jacques Nadeau <ja...@dremio.com>
> >> wrote:
> >> >
> >> > > I spent a bunch of time looking at the Phi coprocessors and forgot
> to
> >> get
> >> > > back to the thread. I'd love it if someone spent some time looking
> at
> >> > > leveraging them (since Drill is frequently processor bound).  Any
> >> takers?
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Jacques Nadeau
> >> > > CTO and Co-Founder, Dremio
> >> > >
> >> > > On Mon, Aug 31, 2015 at 10:24 PM, Parth Chandra <parthc@apache.org
> >> > > <javascript:;>> wrote:
> >> > >
> >> > > > Hi Edmon,
> >> > > >   Sorry no one seems to have got back to you on this.
> >> > > >   We are in the process of publishing a test suite for regression
> >> > testing
> >> > > > Drill and the cluster you have (even a few nodes ) would be a
> great
> >> > > > resource for folks to run the test suite. Rahul, et al are working
> >> on
> >> > > this
> >> > > > and I would suggest watching out for Rahul's posts on the topic.
> >> > > >
> >> > > > Parth
> >> > > >
> >> > > > On Tue, Aug 25, 2015 at 9:55 PM, Edmon Begoli <ebegoli@gmail.com
> >> > > <javascript:;>> wrote:
> >> > > >
> >> > > > > Hey folks,
> >> > > > >
> >> > > > > As we discussed today on a hangout, this is a machine that we
> >> have at
> >> > > > > JICS/NICS
> >> > > > > where I have Drill installed and where I could set up a test
> >> cluster
> >> > > over
> >> > > > > few nodes.
> >> > > > >
> >> > > > >
> >> > >
> >> https://www.nics.tennessee.edu/computing-resources/beacon/configuration
> >> > > > >
> >> > > > > Note that each node is:
> >> > > > > - 2x8-core Intel® Xeon® E5-2670 processors
> >> > > > > - 256 GB of memory
> >> > > > > - 4 Intel® Xeon Phi™ coprocessors 5110P with 8 GB of memory each
> >> > > > > - 960 GB of SSD storage
> >> > > > >
> >> > > > > Would someone advise on what would be an interesting test setup?
> >> > > > >
> >> > > > > Thank you,
> >> > > > > Edmon
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>

Re: Potential resource for large scale testing

Posted by Edmon Begoli <eb...@gmail.com>.
I discussed this idea of bringing large compute resource yesterday with my
team at JICS to the project, and there was a general consensus that it can
be committed.

I will request and hopefully commit pretty large set of
clustered CPU/storage resources for the needs of a Drill project.

I will be the PI for the resource, and could give access to whomever we
want to designate from the Drill project side.

Just let me know. I should have project approved within few days.

Edmon


On Saturday, September 5, 2015, Edmon Begoli <eb...@gmail.com> wrote:

> Ted,
>
> It is actually very easy and painless to do what I am proposing. I
> probably made it sound far more bureaucratic/legalistic than it really is.
>
> Researchers and projects from across the globe can apply for cycles on
> Beacon or any other HPC platform we run. (Beacon is by far the best and we
> already have a setup to run Spark and Hive on it. (We just published paper
> about it at XSEDE on integrating PBS/TORQUE scheduler with Spark to run
> JVM-bound jobs))
>
> As for use of resources, at the end of year we need to submit reports for
> all the projects that used compute resources and how.
> It is part of our mission, as being one of the XSEDE centers, to
> help promote the advancement of the science and technology.
> Reports from Principal Investigators (PI) show how we did it. In this
> case, I can be a PI and have any/someone from the Drill team assigned
> access.
>
> I don't think there are any IP issues. Open source project, open research
> institution, use of resources for testing and benchmarking. We could
> actually make JICS a benchmarking site for Drill (and even other Apache
> projects).
>
> We'll discuss other details in a hangout. I am also planning to brief my
> team next Wednesday on the plan for the use of resources.
>
> Regards,
> Edmon
>
>
> On Saturday, September 5, 2015, Ted Dunning <ted.dunning@gmail.com
> <javascript:_e(%7B%7D,'cvml','ted.dunning@gmail.com');>> wrote:
>
>> Edmon,
>>
>> This is very interesting.  I am sure that public acknowledgements of
>> contributions are easily managed.
>>
>> What might be even more useful for you would be small scale publications,
>> especially about the problems of shoe-horning real-world data objects into
>> the quasi-relational model of Drill.
>>
>> What would be problematic (and what is probably just a matter of
>> nomenclature) is naming of an institution by the Apache specific term
>> "committer" (you said commitment). Individuals at your institution would
>> absolutely be up for being committers as they demonstrate a track record
>> of
>> contribution.
>>
>> I would expect no need for any paperwork between JICS and Apache unless
>> you
>> would like to execute a corporate contributor license to ensure that
>> particular individuals are specifically empowered to contribute code. I
>> don't know that the position of JICS is relative to intellectual property,
>> though, so it might be worth checking out institutional policy on your
>> side
>> on how individuals can contribute to open source projects. It shouldn't be
>> too hard since there are quite a number of NSF funded people who do
>> contribute.
>>
>>
>>
>>
>>
>> On Fri, Sep 4, 2015 at 9:39 PM, Edmon Begoli <eb...@gmail.com> wrote:
>>
>> > I can work with my institution and the NSF that we committ the time on
>> the
>> > Beacon supercomputing cluster to Apache and the Drill project. Maybe 20
>> > hours a month for 4-5 nodes.
>> >
>> > I have discretionary hours that I can put in, and I can, with our
>> > HPC admins, create deploy scripts on few clustered machines (these are
>> all
>> > very large boxes with 16 cores, 256 GB, 40gb IB interconnect, and
>> > with local 1 TB SSD each). There is also Medusa 10 PB filesystem
>> attached
>> > but HDFS over local drives would probably be better.
>> > They are otherwise just a regular machines, and run regular JVMs on
>> Linux.
>> >
>> > We can also get Rahul an access with a secure token to setup
>> > and run stress/performance/integration tests for Drill. I can actually
>> help
>> > there as well. This can be automated to run tests and collect results.
>> >
>> > I think that the only requirement would be that the JICS team be named
>> for
>> > commitment because both NSF/XSEDE and UT like to see the resources
>> > being officially used and acknowledged. They are there to support open
>> and
>> > academic research; open source projects fit well.
>> >
>> > If this sounds OK with the project PMCs, I can start the process of
>> > allocation, accounts creation, setup.
>> >
>> > I would also, as a CDO, of JICS sign whatever standard papers with
>> > the Apache organization.
>> >
>> > With all this being said, let me know please if this is something we
>> want
>> > to pursue.
>> >
>> > Thank you,
>> > Edmon
>> >
>> > On Tuesday, September 1, 2015, Jacques Nadeau <ja...@dremio.com>
>> wrote:
>> >
>> > > I spent a bunch of time looking at the Phi coprocessors and forgot to
>> get
>> > > back to the thread. I'd love it if someone spent some time looking at
>> > > leveraging them (since Drill is frequently processor bound).  Any
>> takers?
>> > >
>> > >
>> > >
>> > > --
>> > > Jacques Nadeau
>> > > CTO and Co-Founder, Dremio
>> > >
>> > > On Mon, Aug 31, 2015 at 10:24 PM, Parth Chandra <parthc@apache.org
>> > > <javascript:;>> wrote:
>> > >
>> > > > Hi Edmon,
>> > > >   Sorry no one seems to have got back to you on this.
>> > > >   We are in the process of publishing a test suite for regression
>> > testing
>> > > > Drill and the cluster you have (even a few nodes ) would be a great
>> > > > resource for folks to run the test suite. Rahul, et al are working
>> on
>> > > this
>> > > > and I would suggest watching out for Rahul's posts on the topic.
>> > > >
>> > > > Parth
>> > > >
>> > > > On Tue, Aug 25, 2015 at 9:55 PM, Edmon Begoli <ebegoli@gmail.com
>> > > <javascript:;>> wrote:
>> > > >
>> > > > > Hey folks,
>> > > > >
>> > > > > As we discussed today on a hangout, this is a machine that we
>> have at
>> > > > > JICS/NICS
>> > > > > where I have Drill installed and where I could set up a test
>> cluster
>> > > over
>> > > > > few nodes.
>> > > > >
>> > > > >
>> > >
>> https://www.nics.tennessee.edu/computing-resources/beacon/configuration
>> > > > >
>> > > > > Note that each node is:
>> > > > > - 2x8-core Intel® Xeon® E5-2670 processors
>> > > > > - 256 GB of memory
>> > > > > - 4 Intel® Xeon Phi™ coprocessors 5110P with 8 GB of memory each
>> > > > > - 960 GB of SSD storage
>> > > > >
>> > > > > Would someone advise on what would be an interesting test setup?
>> > > > >
>> > > > > Thank you,
>> > > > > Edmon
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: Potential resource for large scale testing

Posted by Edmon Begoli <eb...@gmail.com>.
Ted,

It is actually very easy and painless to do what I am proposing. I probably
made it sound far more bureaucratic/legalistic than it really is.

Researchers and projects from across the globe can apply for cycles on
Beacon or any other HPC platform we run. (Beacon is by far the best and we
already have a setup to run Spark and Hive on it. (We just published paper
about it at XSEDE on integrating PBS/TORQUE scheduler with Spark to run
JVM-bound jobs))

As for use of resources, at the end of year we need to submit reports for
all the projects that used compute resources and how.
It is part of our mission, as being one of the XSEDE centers, to
help promote the advancement of the science and technology.
Reports from Principal Investigators (PI) show how we did it. In this case,
I can be a PI and have any/someone from the Drill team assigned access.

I don't think there are any IP issues. Open source project, open research
institution, use of resources for testing and benchmarking. We could
actually make JICS a benchmarking site for Drill (and even other Apache
projects).

We'll discuss other details in a hangout. I am also planning to brief my
team next Wednesday on the plan for the use of resources.

Regards,
Edmon


On Saturday, September 5, 2015, Ted Dunning <te...@gmail.com> wrote:

> Edmon,
>
> This is very interesting.  I am sure that public acknowledgements of
> contributions are easily managed.
>
> What might be even more useful for you would be small scale publications,
> especially about the problems of shoe-horning real-world data objects into
> the quasi-relational model of Drill.
>
> What would be problematic (and what is probably just a matter of
> nomenclature) is naming of an institution by the Apache specific term
> "committer" (you said commitment). Individuals at your institution would
> absolutely be up for being committers as they demonstrate a track record of
> contribution.
>
> I would expect no need for any paperwork between JICS and Apache unless you
> would like to execute a corporate contributor license to ensure that
> particular individuals are specifically empowered to contribute code. I
> don't know that the position of JICS is relative to intellectual property,
> though, so it might be worth checking out institutional policy on your side
> on how individuals can contribute to open source projects. It shouldn't be
> too hard since there are quite a number of NSF funded people who do
> contribute.
>
>
>
>
>
> On Fri, Sep 4, 2015 at 9:39 PM, Edmon Begoli <ebegoli@gmail.com
> <javascript:;>> wrote:
>
> > I can work with my institution and the NSF that we committ the time on
> the
> > Beacon supercomputing cluster to Apache and the Drill project. Maybe 20
> > hours a month for 4-5 nodes.
> >
> > I have discretionary hours that I can put in, and I can, with our
> > HPC admins, create deploy scripts on few clustered machines (these are
> all
> > very large boxes with 16 cores, 256 GB, 40gb IB interconnect, and
> > with local 1 TB SSD each). There is also Medusa 10 PB filesystem attached
> > but HDFS over local drives would probably be better.
> > They are otherwise just a regular machines, and run regular JVMs on
> Linux.
> >
> > We can also get Rahul an access with a secure token to setup
> > and run stress/performance/integration tests for Drill. I can actually
> help
> > there as well. This can be automated to run tests and collect results.
> >
> > I think that the only requirement would be that the JICS team be named
> for
> > commitment because both NSF/XSEDE and UT like to see the resources
> > being officially used and acknowledged. They are there to support open
> and
> > academic research; open source projects fit well.
> >
> > If this sounds OK with the project PMCs, I can start the process of
> > allocation, accounts creation, setup.
> >
> > I would also, as a CDO, of JICS sign whatever standard papers with
> > the Apache organization.
> >
> > With all this being said, let me know please if this is something we want
> > to pursue.
> >
> > Thank you,
> > Edmon
> >
> > On Tuesday, September 1, 2015, Jacques Nadeau <jacques@dremio.com
> <javascript:;>> wrote:
> >
> > > I spent a bunch of time looking at the Phi coprocessors and forgot to
> get
> > > back to the thread. I'd love it if someone spent some time looking at
> > > leveraging them (since Drill is frequently processor bound).  Any
> takers?
> > >
> > >
> > >
> > > --
> > > Jacques Nadeau
> > > CTO and Co-Founder, Dremio
> > >
> > > On Mon, Aug 31, 2015 at 10:24 PM, Parth Chandra <parthc@apache.org
> <javascript:;>
> > > <javascript:;>> wrote:
> > >
> > > > Hi Edmon,
> > > >   Sorry no one seems to have got back to you on this.
> > > >   We are in the process of publishing a test suite for regression
> > testing
> > > > Drill and the cluster you have (even a few nodes ) would be a great
> > > > resource for folks to run the test suite. Rahul, et al are working on
> > > this
> > > > and I would suggest watching out for Rahul's posts on the topic.
> > > >
> > > > Parth
> > > >
> > > > On Tue, Aug 25, 2015 at 9:55 PM, Edmon Begoli <ebegoli@gmail.com
> <javascript:;>
> > > <javascript:;>> wrote:
> > > >
> > > > > Hey folks,
> > > > >
> > > > > As we discussed today on a hangout, this is a machine that we have
> at
> > > > > JICS/NICS
> > > > > where I have Drill installed and where I could set up a test
> cluster
> > > over
> > > > > few nodes.
> > > > >
> > > > >
> > >
> https://www.nics.tennessee.edu/computing-resources/beacon/configuration
> > > > >
> > > > > Note that each node is:
> > > > > - 2x8-core Intel® Xeon® E5-2670 processors
> > > > > - 256 GB of memory
> > > > > - 4 Intel® Xeon Phi™ coprocessors 5110P with 8 GB of memory each
> > > > > - 960 GB of SSD storage
> > > > >
> > > > > Would someone advise on what would be an interesting test setup?
> > > > >
> > > > > Thank you,
> > > > > Edmon
> > > > >
> > > >
> > >
> >
>

Re: Potential resource for large scale testing

Posted by Ted Dunning <te...@gmail.com>.
Edmon,

This is very interesting.  I am sure that public acknowledgements of
contributions are easily managed.

What might be even more useful for you would be small scale publications,
especially about the problems of shoe-horning real-world data objects into
the quasi-relational model of Drill.

What would be problematic (and what is probably just a matter of
nomenclature) is naming of an institution by the Apache specific term
"committer" (you said commitment). Individuals at your institution would
absolutely be up for being committers as they demonstrate a track record of
contribution.

I would expect no need for any paperwork between JICS and Apache unless you
would like to execute a corporate contributor license to ensure that
particular individuals are specifically empowered to contribute code. I
don't know that the position of JICS is relative to intellectual property,
though, so it might be worth checking out institutional policy on your side
on how individuals can contribute to open source projects. It shouldn't be
too hard since there are quite a number of NSF funded people who do
contribute.





On Fri, Sep 4, 2015 at 9:39 PM, Edmon Begoli <eb...@gmail.com> wrote:

> I can work with my institution and the NSF that we committ the time on the
> Beacon supercomputing cluster to Apache and the Drill project. Maybe 20
> hours a month for 4-5 nodes.
>
> I have discretionary hours that I can put in, and I can, with our
> HPC admins, create deploy scripts on few clustered machines (these are all
> very large boxes with 16 cores, 256 GB, 40gb IB interconnect, and
> with local 1 TB SSD each). There is also Medusa 10 PB filesystem attached
> but HDFS over local drives would probably be better.
> They are otherwise just a regular machines, and run regular JVMs on Linux.
>
> We can also get Rahul an access with a secure token to setup
> and run stress/performance/integration tests for Drill. I can actually help
> there as well. This can be automated to run tests and collect results.
>
> I think that the only requirement would be that the JICS team be named for
> commitment because both NSF/XSEDE and UT like to see the resources
> being officially used and acknowledged. They are there to support open and
> academic research; open source projects fit well.
>
> If this sounds OK with the project PMCs, I can start the process of
> allocation, accounts creation, setup.
>
> I would also, as a CDO, of JICS sign whatever standard papers with
> the Apache organization.
>
> With all this being said, let me know please if this is something we want
> to pursue.
>
> Thank you,
> Edmon
>
> On Tuesday, September 1, 2015, Jacques Nadeau <ja...@dremio.com> wrote:
>
> > I spent a bunch of time looking at the Phi coprocessors and forgot to get
> > back to the thread. I'd love it if someone spent some time looking at
> > leveraging them (since Drill is frequently processor bound).  Any takers?
> >
> >
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Mon, Aug 31, 2015 at 10:24 PM, Parth Chandra <parthc@apache.org
> > <javascript:;>> wrote:
> >
> > > Hi Edmon,
> > >   Sorry no one seems to have got back to you on this.
> > >   We are in the process of publishing a test suite for regression
> testing
> > > Drill and the cluster you have (even a few nodes ) would be a great
> > > resource for folks to run the test suite. Rahul, et al are working on
> > this
> > > and I would suggest watching out for Rahul's posts on the topic.
> > >
> > > Parth
> > >
> > > On Tue, Aug 25, 2015 at 9:55 PM, Edmon Begoli <ebegoli@gmail.com
> > <javascript:;>> wrote:
> > >
> > > > Hey folks,
> > > >
> > > > As we discussed today on a hangout, this is a machine that we have at
> > > > JICS/NICS
> > > > where I have Drill installed and where I could set up a test cluster
> > over
> > > > few nodes.
> > > >
> > > >
> > https://www.nics.tennessee.edu/computing-resources/beacon/configuration
> > > >
> > > > Note that each node is:
> > > > - 2x8-core Intel® Xeon® E5-2670 processors
> > > > - 256 GB of memory
> > > > - 4 Intel® Xeon Phi™ coprocessors 5110P with 8 GB of memory each
> > > > - 960 GB of SSD storage
> > > >
> > > > Would someone advise on what would be an interesting test setup?
> > > >
> > > > Thank you,
> > > > Edmon
> > > >
> > >
> >
>

Re: Potential resource for large scale testing

Posted by Edmon Begoli <eb...@gmail.com>.
I can work with my institution and the NSF that we committ the time on the
Beacon supercomputing cluster to Apache and the Drill project. Maybe 20
hours a month for 4-5 nodes.

I have discretionary hours that I can put in, and I can, with our
HPC admins, create deploy scripts on few clustered machines (these are all
very large boxes with 16 cores, 256 GB, 40gb IB interconnect, and
with local 1 TB SSD each). There is also Medusa 10 PB filesystem attached
but HDFS over local drives would probably be better.
They are otherwise just a regular machines, and run regular JVMs on Linux.

We can also get Rahul an access with a secure token to setup
and run stress/performance/integration tests for Drill. I can actually help
there as well. This can be automated to run tests and collect results.

I think that the only requirement would be that the JICS team be named for
commitment because both NSF/XSEDE and UT like to see the resources
being officially used and acknowledged. They are there to support open and
academic research; open source projects fit well.

If this sounds OK with the project PMCs, I can start the process of
allocation, accounts creation, setup.

I would also, as a CDO, of JICS sign whatever standard papers with
the Apache organization.

With all this being said, let me know please if this is something we want
to pursue.

Thank you,
Edmon

On Tuesday, September 1, 2015, Jacques Nadeau <ja...@dremio.com> wrote:

> I spent a bunch of time looking at the Phi coprocessors and forgot to get
> back to the thread. I'd love it if someone spent some time looking at
> leveraging them (since Drill is frequently processor bound).  Any takers?
>
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Mon, Aug 31, 2015 at 10:24 PM, Parth Chandra <parthc@apache.org
> <javascript:;>> wrote:
>
> > Hi Edmon,
> >   Sorry no one seems to have got back to you on this.
> >   We are in the process of publishing a test suite for regression testing
> > Drill and the cluster you have (even a few nodes ) would be a great
> > resource for folks to run the test suite. Rahul, et al are working on
> this
> > and I would suggest watching out for Rahul's posts on the topic.
> >
> > Parth
> >
> > On Tue, Aug 25, 2015 at 9:55 PM, Edmon Begoli <ebegoli@gmail.com
> <javascript:;>> wrote:
> >
> > > Hey folks,
> > >
> > > As we discussed today on a hangout, this is a machine that we have at
> > > JICS/NICS
> > > where I have Drill installed and where I could set up a test cluster
> over
> > > few nodes.
> > >
> > >
> https://www.nics.tennessee.edu/computing-resources/beacon/configuration
> > >
> > > Note that each node is:
> > > - 2x8-core Intel® Xeon® E5-2670 processors
> > > - 256 GB of memory
> > > - 4 Intel® Xeon Phi™ coprocessors 5110P with 8 GB of memory each
> > > - 960 GB of SSD storage
> > >
> > > Would someone advise on what would be an interesting test setup?
> > >
> > > Thank you,
> > > Edmon
> > >
> >
>

Re: Potential resource for large scale testing

Posted by Jacques Nadeau <ja...@dremio.com>.
I spent a bunch of time looking at the Phi coprocessors and forgot to get
back to the thread. I'd love it if someone spent some time looking at
leveraging them (since Drill is frequently processor bound).  Any takers?



--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Mon, Aug 31, 2015 at 10:24 PM, Parth Chandra <pa...@apache.org> wrote:

> Hi Edmon,
>   Sorry no one seems to have got back to you on this.
>   We are in the process of publishing a test suite for regression testing
> Drill and the cluster you have (even a few nodes ) would be a great
> resource for folks to run the test suite. Rahul, et al are working on this
> and I would suggest watching out for Rahul's posts on the topic.
>
> Parth
>
> On Tue, Aug 25, 2015 at 9:55 PM, Edmon Begoli <eb...@gmail.com> wrote:
>
> > Hey folks,
> >
> > As we discussed today on a hangout, this is a machine that we have at
> > JICS/NICS
> > where I have Drill installed and where I could set up a test cluster over
> > few nodes.
> >
> > https://www.nics.tennessee.edu/computing-resources/beacon/configuration
> >
> > Note that each node is:
> > - 2x8-core Intel® Xeon® E5-2670 processors
> > - 256 GB of memory
> > - 4 Intel® Xeon Phi™ coprocessors 5110P with 8 GB of memory each
> > - 960 GB of SSD storage
> >
> > Would someone advise on what would be an interesting test setup?
> >
> > Thank you,
> > Edmon
> >
>