You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by Rohit Shinde <ro...@gmail.com> on 2015/06/27 16:20:26 UTC

Student looking to contribute to Stratosphere

Hello everyone,

I came across Stratosphere while looking for GSOC organisations working in
Machine Learning. I got to know that it had become Apache Flink.

I am interested in this project:
https://github.com/stratosphere/stratosphere/wiki/Google-Summer-of-Code-2014#implement-one-or-multiple-machine-learning-algorithms-for-stratosphere

Backgroundd: I am proficient in C++, Java, Python and Scheme. I have taken
undergrad courses in machine learning and data mining. How can I contribute
to the above project?

Thank you,
Rohit Shinde.

Re: Student looking to contribute to Stratosphere

Posted by Rohit Shinde <ro...@gmail.com>.

Okay!

Thank you!

On Wed, Jul 15, 2015 at 6:22 PM, Ufuk Celebi <uc...@apache.org> wrote:

> Hey Rohit,
>
> it's best to do the discussion related to a specific issue *in* the issue
> itself instead of the mailing list.
>
> In general, it's better to ask specific questions. But a general pointer
> would be to look into the existing ML algorithm implementations, Stephan's
> approximate PageRank implementation linked in the issue, and then think
> about how to translate it into the ML library. This would also be a first
> step to asking more specific questions.
>
> – Ufuk
>
> On Wed, Jul 15, 2015 at 2:42 PM, Rohit Shinde <rohit.shinde12194@gmail.com
> >
> wrote:
>
> > I intend to solve this issue:
> > https://issues.apache.org/jira/browse/FLINK-1748
> >
> > Could someone give me some pointers on how to approach this?
> >
> > On Wed, Jul 15, 2015 at 4:58 PM, Kostas Tzoumas <kt...@apache.org>
> > wrote:
> >
> > > IDE choice is up to you with some limitations, see here for IDE setup
> > > instructions:
> > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-0.9/internals/ide_setup.html
> > >
> > >
> > > Scala IDE is not limited to Scala, it is based on Eclipse, so you can
> > > develop in Java. Most committers are using IntelliJ as far as I know.
> > >
> > > On Wed, Jul 15, 2015 at 1:24 PM, Rohit Shinde <
> > rohit.shinde12194@gmail.com
> > > >
> > > wrote:
> > >
> > > > What IDE should I use? There are various options and I already have
> > > Eclipse
> > > > Luna. The IDE page lists that the Scala IDE is the best. So should I
> go
> > > > with the Scala IDE? Will I be able to develop in Java later?
> > > >
> > > > On Wed, Jul 15, 2015 at 4:44 PM, Kostas Tzoumas <ktzoumas@apache.org
> >
> > > > wrote:
> > > >
> > > > > Hi Rohit,
> > > > >
> > > > > If you are just working on your laptop, I personally find it much
> > > easier
> > > > to
> > > > > work without Hadoop and use the local file system or just Java
> > > > collections
> > > > > for testing and trying out ideas.
> > > > >
> > > > > When you move to a cluster, it is common to use a Hadoop
> installation
> > > to
> > > > > store large files in HDFS. There, you can run Flink jobs using
> > Flink's
> > > > YARN
> > > > > mode.
> > > > >
> > > > > Kostas
> > > > >
> > > > > On Wed, Jul 15, 2015 at 8:22 AM, Márton Balassi <
> > > > balassi.marton@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Hadoop is not a necessity for running Flink, but rather an
> option.
> > > Try
> > > > > the
> > > > > > steps of the setup guide. [1]
> > > > > > If you really nee HDFS though to get the best IO performance I
> > would
> > > > > > suggest having Hadoop on all your machines running Flink.
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-0.9/quickstart/setup_quickstart.html
> > > > > >
> > > > > > On Jul 15, 2015 5:27 AM, "Rohit Shinde" <
> > rohit.shinde12194@gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > Sorry for the brief hiatus. I was preparing for my GRE exam,
> but
> > I
> > > am
> > > > > > back.
> > > > > > > I am starting to build Flink and a doubt which I had was, is a
> > > > > > single-node
> > > > > > > cluster configuration of Hadoop enough? I assume Hadoop is
> needed
> > > > since
> > > > > > it
> > > > > > > is given on the build page.
> > > > > > >
> > > > > > > On Sat, Jun 27, 2015 at 8:02 PM, Chiwan Park <
> > > chiwanpark@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi, You can choose any unassigned issue about Flink Machine
> > > > Learning
> > > > > > > > Library (flink-ml) in JIRA. [1]
> > > > > > > > There are some issues for starter in flink-ml such as
> > FLINK-1737
> > > > [2],
> > > > > > > > FLINK-1748 [3], FLINK-1994 [4].
> > > > > > > >
> > > > > > > > First, It would be better to read some articles about
> > > contributing
> > > > to
> > > > > > > > Flink. [5][6]
> > > > > > > > And if you decide a issue to contribute, please assign it to
> > you.
> > > > If
> > > > > > you
> > > > > > > > don’t have permission to
> > > > > > > > assign, just comment into the issue. Then other people give
> > > > > permission
> > > > > > to
> > > > > > > > you and assign
> > > > > > > > the issue to you.
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Chiwan Park
> > > > > > > >
> > > > > > > > [1] https://issues.apache.org/jira/
> > > > > > > > [2] https://issues.apache.org/jira/browse/FLINK-1737
> > > > > > > > [3] https://issues.apache.org/jira/browse/FLINK-1748
> > > > > > > > [4] https://issues.apache.org/jira/browse/FLINK-1994
> > > > > > > > [5] http://flink.apache.org/how-to-contribute.html
> > > > > > > > [6] http://flink.apache.org/coding-guidelines.html
> > > > > > > >
> > > > > > > > > On Jun 27, 2015, at 11:20 PM, Rohit Shinde <
> > > > > > > rohit.shinde12194@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Hello everyone,
> > > > > > > > >
> > > > > > > > > I came across Stratosphere while looking for GSOC
> > organisations
> > > > > > working
> > > > > > > > in
> > > > > > > > > Machine Learning. I got to know that it had become Apache
> > > Flink.
> > > > > > > > >
> > > > > > > > > I am interested in this project:
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/stratosphere/stratosphere/wiki/Google-Summer-of-Code-2014#implement-one-or-multiple-machine-learning-algorithms-for-stratosphere
> > > > > > > > >
> > > > > > > > > Backgroundd: I am proficient in C++, Java, Python and
> > Scheme. I
> > > > > have
> > > > > > > > taken
> > > > > > > > > undergrad courses in machine learning and data mining. How
> > can
> > > I
> > > > > > > > contribute
> > > > > > > > > to the above project?
> > > > > > > > >
> > > > > > > > > Thank you,
> > > > > > > > > Rohit Shinde.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Student looking to contribute to Stratosphere

Posted by Ufuk Celebi <uc...@apache.org>.

Hey Rohit,

it's best to do the discussion related to a specific issue *in* the issue
itself instead of the mailing list.

In general, it's better to ask specific questions. But a general pointer
would be to look into the existing ML algorithm implementations, Stephan's
approximate PageRank implementation linked in the issue, and then think
about how to translate it into the ML library. This would also be a first
step to asking more specific questions.

– Ufuk

On Wed, Jul 15, 2015 at 2:42 PM, Rohit Shinde <ro...@gmail.com>
wrote:

> I intend to solve this issue:
> https://issues.apache.org/jira/browse/FLINK-1748
>
> Could someone give me some pointers on how to approach this?
>
> On Wed, Jul 15, 2015 at 4:58 PM, Kostas Tzoumas <kt...@apache.org>
> wrote:
>
> > IDE choice is up to you with some limitations, see here for IDE setup
> > instructions:
> >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-0.9/internals/ide_setup.html
> >
> >
> > Scala IDE is not limited to Scala, it is based on Eclipse, so you can
> > develop in Java. Most committers are using IntelliJ as far as I know.
> >
> > On Wed, Jul 15, 2015 at 1:24 PM, Rohit Shinde <
> rohit.shinde12194@gmail.com
> > >
> > wrote:
> >
> > > What IDE should I use? There are various options and I already have
> > Eclipse
> > > Luna. The IDE page lists that the Scala IDE is the best. So should I go
> > > with the Scala IDE? Will I be able to develop in Java later?
> > >
> > > On Wed, Jul 15, 2015 at 4:44 PM, Kostas Tzoumas <kt...@apache.org>
> > > wrote:
> > >
> > > > Hi Rohit,
> > > >
> > > > If you are just working on your laptop, I personally find it much
> > easier
> > > to
> > > > work without Hadoop and use the local file system or just Java
> > > collections
> > > > for testing and trying out ideas.
> > > >
> > > > When you move to a cluster, it is common to use a Hadoop installation
> > to
> > > > store large files in HDFS. There, you can run Flink jobs using
> Flink's
> > > YARN
> > > > mode.
> > > >
> > > > Kostas
> > > >
> > > > On Wed, Jul 15, 2015 at 8:22 AM, Márton Balassi <
> > > balassi.marton@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Hadoop is not a necessity for running Flink, but rather an option.
> > Try
> > > > the
> > > > > steps of the setup guide. [1]
> > > > > If you really nee HDFS though to get the best IO performance I
> would
> > > > > suggest having Hadoop on all your machines running Flink.
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-0.9/quickstart/setup_quickstart.html
> > > > >
> > > > > On Jul 15, 2015 5:27 AM, "Rohit Shinde" <
> rohit.shinde12194@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Sorry for the brief hiatus. I was preparing for my GRE exam, but
> I
> > am
> > > > > back.
> > > > > > I am starting to build Flink and a doubt which I had was, is a
> > > > > single-node
> > > > > > cluster configuration of Hadoop enough? I assume Hadoop is needed
> > > since
> > > > > it
> > > > > > is given on the build page.
> > > > > >
> > > > > > On Sat, Jun 27, 2015 at 8:02 PM, Chiwan Park <
> > chiwanpark@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi, You can choose any unassigned issue about Flink Machine
> > > Learning
> > > > > > > Library (flink-ml) in JIRA. [1]
> > > > > > > There are some issues for starter in flink-ml such as
> FLINK-1737
> > > [2],
> > > > > > > FLINK-1748 [3], FLINK-1994 [4].
> > > > > > >
> > > > > > > First, It would be better to read some articles about
> > contributing
> > > to
> > > > > > > Flink. [5][6]
> > > > > > > And if you decide a issue to contribute, please assign it to
> you.
> > > If
> > > > > you
> > > > > > > don’t have permission to
> > > > > > > assign, just comment into the issue. Then other people give
> > > > permission
> > > > > to
> > > > > > > you and assign
> > > > > > > the issue to you.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Chiwan Park
> > > > > > >
> > > > > > > [1] https://issues.apache.org/jira/
> > > > > > > [2] https://issues.apache.org/jira/browse/FLINK-1737
> > > > > > > [3] https://issues.apache.org/jira/browse/FLINK-1748
> > > > > > > [4] https://issues.apache.org/jira/browse/FLINK-1994
> > > > > > > [5] http://flink.apache.org/how-to-contribute.html
> > > > > > > [6] http://flink.apache.org/coding-guidelines.html
> > > > > > >
> > > > > > > > On Jun 27, 2015, at 11:20 PM, Rohit Shinde <
> > > > > > rohit.shinde12194@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Hello everyone,
> > > > > > > >
> > > > > > > > I came across Stratosphere while looking for GSOC
> organisations
> > > > > working
> > > > > > > in
> > > > > > > > Machine Learning. I got to know that it had become Apache
> > Flink.
> > > > > > > >
> > > > > > > > I am interested in this project:
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/stratosphere/stratosphere/wiki/Google-Summer-of-Code-2014#implement-one-or-multiple-machine-learning-algorithms-for-stratosphere
> > > > > > > >
> > > > > > > > Backgroundd: I am proficient in C++, Java, Python and
> Scheme. I
> > > > have
> > > > > > > taken
> > > > > > > > undergrad courses in machine learning and data mining. How
> can
> > I
> > > > > > > contribute
> > > > > > > > to the above project?
> > > > > > > >
> > > > > > > > Thank you,
> > > > > > > > Rohit Shinde.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Student looking to contribute to Stratosphere

Posted by Rohit Shinde <ro...@gmail.com>.

I intend to solve this issue:
https://issues.apache.org/jira/browse/FLINK-1748

Could someone give me some pointers on how to approach this?

On Wed, Jul 15, 2015 at 4:58 PM, Kostas Tzoumas <kt...@apache.org> wrote:

> IDE choice is up to you with some limitations, see here for IDE setup
> instructions:
>
> https://ci.apache.org/projects/flink/flink-docs-release-0.9/internals/ide_setup.html
>
>
> Scala IDE is not limited to Scala, it is based on Eclipse, so you can
> develop in Java. Most committers are using IntelliJ as far as I know.
>
> On Wed, Jul 15, 2015 at 1:24 PM, Rohit Shinde <rohit.shinde12194@gmail.com
> >
> wrote:
>
> > What IDE should I use? There are various options and I already have
> Eclipse
> > Luna. The IDE page lists that the Scala IDE is the best. So should I go
> > with the Scala IDE? Will I be able to develop in Java later?
> >
> > On Wed, Jul 15, 2015 at 4:44 PM, Kostas Tzoumas <kt...@apache.org>
> > wrote:
> >
> > > Hi Rohit,
> > >
> > > If you are just working on your laptop, I personally find it much
> easier
> > to
> > > work without Hadoop and use the local file system or just Java
> > collections
> > > for testing and trying out ideas.
> > >
> > > When you move to a cluster, it is common to use a Hadoop installation
> to
> > > store large files in HDFS. There, you can run Flink jobs using Flink's
> > YARN
> > > mode.
> > >
> > > Kostas
> > >
> > > On Wed, Jul 15, 2015 at 8:22 AM, Márton Balassi <
> > balassi.marton@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Hadoop is not a necessity for running Flink, but rather an option.
> Try
> > > the
> > > > steps of the setup guide. [1]
> > > > If you really nee HDFS though to get the best IO performance I would
> > > > suggest having Hadoop on all your machines running Flink.
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-0.9/quickstart/setup_quickstart.html
> > > >
> > > > On Jul 15, 2015 5:27 AM, "Rohit Shinde" <rohit.shinde12194@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Sorry for the brief hiatus. I was preparing for my GRE exam, but I
> am
> > > > back.
> > > > > I am starting to build Flink and a doubt which I had was, is a
> > > > single-node
> > > > > cluster configuration of Hadoop enough? I assume Hadoop is needed
> > since
> > > > it
> > > > > is given on the build page.
> > > > >
> > > > > On Sat, Jun 27, 2015 at 8:02 PM, Chiwan Park <
> chiwanpark@apache.org>
> > > > > wrote:
> > > > >
> > > > > > Hi, You can choose any unassigned issue about Flink Machine
> > Learning
> > > > > > Library (flink-ml) in JIRA. [1]
> > > > > > There are some issues for starter in flink-ml such as FLINK-1737
> > [2],
> > > > > > FLINK-1748 [3], FLINK-1994 [4].
> > > > > >
> > > > > > First, It would be better to read some articles about
> contributing
> > to
> > > > > > Flink. [5][6]
> > > > > > And if you decide a issue to contribute, please assign it to you.
> > If
> > > > you
> > > > > > don’t have permission to
> > > > > > assign, just comment into the issue. Then other people give
> > > permission
> > > > to
> > > > > > you and assign
> > > > > > the issue to you.
> > > > > >
> > > > > > Regards,
> > > > > > Chiwan Park
> > > > > >
> > > > > > [1] https://issues.apache.org/jira/
> > > > > > [2] https://issues.apache.org/jira/browse/FLINK-1737
> > > > > > [3] https://issues.apache.org/jira/browse/FLINK-1748
> > > > > > [4] https://issues.apache.org/jira/browse/FLINK-1994
> > > > > > [5] http://flink.apache.org/how-to-contribute.html
> > > > > > [6] http://flink.apache.org/coding-guidelines.html
> > > > > >
> > > > > > > On Jun 27, 2015, at 11:20 PM, Rohit Shinde <
> > > > > rohit.shinde12194@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > Hello everyone,
> > > > > > >
> > > > > > > I came across Stratosphere while looking for GSOC organisations
> > > > working
> > > > > > in
> > > > > > > Machine Learning. I got to know that it had become Apache
> Flink.
> > > > > > >
> > > > > > > I am interested in this project:
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/stratosphere/stratosphere/wiki/Google-Summer-of-Code-2014#implement-one-or-multiple-machine-learning-algorithms-for-stratosphere
> > > > > > >
> > > > > > > Backgroundd: I am proficient in C++, Java, Python and Scheme. I
> > > have
> > > > > > taken
> > > > > > > undergrad courses in machine learning and data mining. How can
> I
> > > > > > contribute
> > > > > > > to the above project?
> > > > > > >
> > > > > > > Thank you,
> > > > > > > Rohit Shinde.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Student looking to contribute to Stratosphere

Posted by Kostas Tzoumas <kt...@apache.org>.

IDE choice is up to you with some limitations, see here for IDE setup
instructions:
https://ci.apache.org/projects/flink/flink-docs-release-0.9/internals/ide_setup.html


Scala IDE is not limited to Scala, it is based on Eclipse, so you can
develop in Java. Most committers are using IntelliJ as far as I know.

On Wed, Jul 15, 2015 at 1:24 PM, Rohit Shinde <ro...@gmail.com>
wrote:

> What IDE should I use? There are various options and I already have Eclipse
> Luna. The IDE page lists that the Scala IDE is the best. So should I go
> with the Scala IDE? Will I be able to develop in Java later?
>
> On Wed, Jul 15, 2015 at 4:44 PM, Kostas Tzoumas <kt...@apache.org>
> wrote:
>
> > Hi Rohit,
> >
> > If you are just working on your laptop, I personally find it much easier
> to
> > work without Hadoop and use the local file system or just Java
> collections
> > for testing and trying out ideas.
> >
> > When you move to a cluster, it is common to use a Hadoop installation to
> > store large files in HDFS. There, you can run Flink jobs using Flink's
> YARN
> > mode.
> >
> > Kostas
> >
> > On Wed, Jul 15, 2015 at 8:22 AM, Márton Balassi <
> balassi.marton@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > Hadoop is not a necessity for running Flink, but rather an option. Try
> > the
> > > steps of the setup guide. [1]
> > > If you really nee HDFS though to get the best IO performance I would
> > > suggest having Hadoop on all your machines running Flink.
> > >
> > > [1]
> > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-0.9/quickstart/setup_quickstart.html
> > >
> > > On Jul 15, 2015 5:27 AM, "Rohit Shinde" <ro...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Sorry for the brief hiatus. I was preparing for my GRE exam, but I am
> > > back.
> > > > I am starting to build Flink and a doubt which I had was, is a
> > > single-node
> > > > cluster configuration of Hadoop enough? I assume Hadoop is needed
> since
> > > it
> > > > is given on the build page.
> > > >
> > > > On Sat, Jun 27, 2015 at 8:02 PM, Chiwan Park <ch...@apache.org>
> > > > wrote:
> > > >
> > > > > Hi, You can choose any unassigned issue about Flink Machine
> Learning
> > > > > Library (flink-ml) in JIRA. [1]
> > > > > There are some issues for starter in flink-ml such as FLINK-1737
> [2],
> > > > > FLINK-1748 [3], FLINK-1994 [4].
> > > > >
> > > > > First, It would be better to read some articles about contributing
> to
> > > > > Flink. [5][6]
> > > > > And if you decide a issue to contribute, please assign it to you.
> If
> > > you
> > > > > don’t have permission to
> > > > > assign, just comment into the issue. Then other people give
> > permission
> > > to
> > > > > you and assign
> > > > > the issue to you.
> > > > >
> > > > > Regards,
> > > > > Chiwan Park
> > > > >
> > > > > [1] https://issues.apache.org/jira/
> > > > > [2] https://issues.apache.org/jira/browse/FLINK-1737
> > > > > [3] https://issues.apache.org/jira/browse/FLINK-1748
> > > > > [4] https://issues.apache.org/jira/browse/FLINK-1994
> > > > > [5] http://flink.apache.org/how-to-contribute.html
> > > > > [6] http://flink.apache.org/coding-guidelines.html
> > > > >
> > > > > > On Jun 27, 2015, at 11:20 PM, Rohit Shinde <
> > > > rohit.shinde12194@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > Hello everyone,
> > > > > >
> > > > > > I came across Stratosphere while looking for GSOC organisations
> > > working
> > > > > in
> > > > > > Machine Learning. I got to know that it had become Apache Flink.
> > > > > >
> > > > > > I am interested in this project:
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/stratosphere/stratosphere/wiki/Google-Summer-of-Code-2014#implement-one-or-multiple-machine-learning-algorithms-for-stratosphere
> > > > > >
> > > > > > Backgroundd: I am proficient in C++, Java, Python and Scheme. I
> > have
> > > > > taken
> > > > > > undergrad courses in machine learning and data mining. How can I
> > > > > contribute
> > > > > > to the above project?
> > > > > >
> > > > > > Thank you,
> > > > > > Rohit Shinde.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: Student looking to contribute to Stratosphere

Posted by Rohit Shinde <ro...@gmail.com>.

What IDE should I use? There are various options and I already have Eclipse
Luna. The IDE page lists that the Scala IDE is the best. So should I go
with the Scala IDE? Will I be able to develop in Java later?

On Wed, Jul 15, 2015 at 4:44 PM, Kostas Tzoumas <kt...@apache.org> wrote:

> Hi Rohit,
>
> If you are just working on your laptop, I personally find it much easier to
> work without Hadoop and use the local file system or just Java collections
> for testing and trying out ideas.
>
> When you move to a cluster, it is common to use a Hadoop installation to
> store large files in HDFS. There, you can run Flink jobs using Flink's YARN
> mode.
>
> Kostas
>
> On Wed, Jul 15, 2015 at 8:22 AM, Márton Balassi <ba...@gmail.com>
> wrote:
>
> > Hi,
> >
> > Hadoop is not a necessity for running Flink, but rather an option. Try
> the
> > steps of the setup guide. [1]
> > If you really nee HDFS though to get the best IO performance I would
> > suggest having Hadoop on all your machines running Flink.
> >
> > [1]
> >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-0.9/quickstart/setup_quickstart.html
> >
> > On Jul 15, 2015 5:27 AM, "Rohit Shinde" <ro...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > Sorry for the brief hiatus. I was preparing for my GRE exam, but I am
> > back.
> > > I am starting to build Flink and a doubt which I had was, is a
> > single-node
> > > cluster configuration of Hadoop enough? I assume Hadoop is needed since
> > it
> > > is given on the build page.
> > >
> > > On Sat, Jun 27, 2015 at 8:02 PM, Chiwan Park <ch...@apache.org>
> > > wrote:
> > >
> > > > Hi, You can choose any unassigned issue about Flink Machine Learning
> > > > Library (flink-ml) in JIRA. [1]
> > > > There are some issues for starter in flink-ml such as FLINK-1737 [2],
> > > > FLINK-1748 [3], FLINK-1994 [4].
> > > >
> > > > First, It would be better to read some articles about contributing to
> > > > Flink. [5][6]
> > > > And if you decide a issue to contribute, please assign it to you. If
> > you
> > > > don’t have permission to
> > > > assign, just comment into the issue. Then other people give
> permission
> > to
> > > > you and assign
> > > > the issue to you.
> > > >
> > > > Regards,
> > > > Chiwan Park
> > > >
> > > > [1] https://issues.apache.org/jira/
> > > > [2] https://issues.apache.org/jira/browse/FLINK-1737
> > > > [3] https://issues.apache.org/jira/browse/FLINK-1748
> > > > [4] https://issues.apache.org/jira/browse/FLINK-1994
> > > > [5] http://flink.apache.org/how-to-contribute.html
> > > > [6] http://flink.apache.org/coding-guidelines.html
> > > >
> > > > > On Jun 27, 2015, at 11:20 PM, Rohit Shinde <
> > > rohit.shinde12194@gmail.com>
> > > > wrote:
> > > > >
> > > > > Hello everyone,
> > > > >
> > > > > I came across Stratosphere while looking for GSOC organisations
> > working
> > > > in
> > > > > Machine Learning. I got to know that it had become Apache Flink.
> > > > >
> > > > > I am interested in this project:
> > > > >
> > > >
> > >
> >
> https://github.com/stratosphere/stratosphere/wiki/Google-Summer-of-Code-2014#implement-one-or-multiple-machine-learning-algorithms-for-stratosphere
> > > > >
> > > > > Backgroundd: I am proficient in C++, Java, Python and Scheme. I
> have
> > > > taken
> > > > > undergrad courses in machine learning and data mining. How can I
> > > > contribute
> > > > > to the above project?
> > > > >
> > > > > Thank you,
> > > > > Rohit Shinde.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
>

Re: Student looking to contribute to Stratosphere

Posted by Kostas Tzoumas <kt...@apache.org>.

Hi Rohit,

If you are just working on your laptop, I personally find it much easier to
work without Hadoop and use the local file system or just Java collections
for testing and trying out ideas.

When you move to a cluster, it is common to use a Hadoop installation to
store large files in HDFS. There, you can run Flink jobs using Flink's YARN
mode.

Kostas

On Wed, Jul 15, 2015 at 8:22 AM, Márton Balassi <ba...@gmail.com>
wrote:

> Hi,
>
> Hadoop is not a necessity for running Flink, but rather an option. Try the
> steps of the setup guide. [1]
> If you really nee HDFS though to get the best IO performance I would
> suggest having Hadoop on all your machines running Flink.
>
> [1]
>
> https://ci.apache.org/projects/flink/flink-docs-release-0.9/quickstart/setup_quickstart.html
>
> On Jul 15, 2015 5:27 AM, "Rohit Shinde" <ro...@gmail.com>
> wrote:
>
> > Hi,
> >
> > Sorry for the brief hiatus. I was preparing for my GRE exam, but I am
> back.
> > I am starting to build Flink and a doubt which I had was, is a
> single-node
> > cluster configuration of Hadoop enough? I assume Hadoop is needed since
> it
> > is given on the build page.
> >
> > On Sat, Jun 27, 2015 at 8:02 PM, Chiwan Park <ch...@apache.org>
> > wrote:
> >
> > > Hi, You can choose any unassigned issue about Flink Machine Learning
> > > Library (flink-ml) in JIRA. [1]
> > > There are some issues for starter in flink-ml such as FLINK-1737 [2],
> > > FLINK-1748 [3], FLINK-1994 [4].
> > >
> > > First, It would be better to read some articles about contributing to
> > > Flink. [5][6]
> > > And if you decide a issue to contribute, please assign it to you. If
> you
> > > don’t have permission to
> > > assign, just comment into the issue. Then other people give permission
> to
> > > you and assign
> > > the issue to you.
> > >
> > > Regards,
> > > Chiwan Park
> > >
> > > [1] https://issues.apache.org/jira/
> > > [2] https://issues.apache.org/jira/browse/FLINK-1737
> > > [3] https://issues.apache.org/jira/browse/FLINK-1748
> > > [4] https://issues.apache.org/jira/browse/FLINK-1994
> > > [5] http://flink.apache.org/how-to-contribute.html
> > > [6] http://flink.apache.org/coding-guidelines.html
> > >
> > > > On Jun 27, 2015, at 11:20 PM, Rohit Shinde <
> > rohit.shinde12194@gmail.com>
> > > wrote:
> > > >
> > > > Hello everyone,
> > > >
> > > > I came across Stratosphere while looking for GSOC organisations
> working
> > > in
> > > > Machine Learning. I got to know that it had become Apache Flink.
> > > >
> > > > I am interested in this project:
> > > >
> > >
> >
> https://github.com/stratosphere/stratosphere/wiki/Google-Summer-of-Code-2014#implement-one-or-multiple-machine-learning-algorithms-for-stratosphere
> > > >
> > > > Backgroundd: I am proficient in C++, Java, Python and Scheme. I have
> > > taken
> > > > undergrad courses in machine learning and data mining. How can I
> > > contribute
> > > > to the above project?
> > > >
> > > > Thank you,
> > > > Rohit Shinde.
> > >
> > >
> > >
> > >
> > >
> > >
> >
>

Re: Student looking to contribute to Stratosphere

Posted by Márton Balassi <ba...@gmail.com>.

Hi,

Hadoop is not a necessity for running Flink, but rather an option. Try the
steps of the setup guide. [1]
If you really nee HDFS though to get the best IO performance I would
suggest having Hadoop on all your machines running Flink.

[1]
https://ci.apache.org/projects/flink/flink-docs-release-0.9/quickstart/setup_quickstart.html

On Jul 15, 2015 5:27 AM, "Rohit Shinde" <ro...@gmail.com> wrote:

> Hi,
>
> Sorry for the brief hiatus. I was preparing for my GRE exam, but I am back.
> I am starting to build Flink and a doubt which I had was, is a single-node
> cluster configuration of Hadoop enough? I assume Hadoop is needed since it
> is given on the build page.
>
> On Sat, Jun 27, 2015 at 8:02 PM, Chiwan Park <ch...@apache.org>
> wrote:
>
> > Hi, You can choose any unassigned issue about Flink Machine Learning
> > Library (flink-ml) in JIRA. [1]
> > There are some issues for starter in flink-ml such as FLINK-1737 [2],
> > FLINK-1748 [3], FLINK-1994 [4].
> >
> > First, It would be better to read some articles about contributing to
> > Flink. [5][6]
> > And if you decide a issue to contribute, please assign it to you. If you
> > don’t have permission to
> > assign, just comment into the issue. Then other people give permission to
> > you and assign
> > the issue to you.
> >
> > Regards,
> > Chiwan Park
> >
> > [1] https://issues.apache.org/jira/
> > [2] https://issues.apache.org/jira/browse/FLINK-1737
> > [3] https://issues.apache.org/jira/browse/FLINK-1748
> > [4] https://issues.apache.org/jira/browse/FLINK-1994
> > [5] http://flink.apache.org/how-to-contribute.html
> > [6] http://flink.apache.org/coding-guidelines.html
> >
> > > On Jun 27, 2015, at 11:20 PM, Rohit Shinde <
> rohit.shinde12194@gmail.com>
> > wrote:
> > >
> > > Hello everyone,
> > >
> > > I came across Stratosphere while looking for GSOC organisations working
> > in
> > > Machine Learning. I got to know that it had become Apache Flink.
> > >
> > > I am interested in this project:
> > >
> >
> https://github.com/stratosphere/stratosphere/wiki/Google-Summer-of-Code-2014#implement-one-or-multiple-machine-learning-algorithms-for-stratosphere
> > >
> > > Backgroundd: I am proficient in C++, Java, Python and Scheme. I have
> > taken
> > > undergrad courses in machine learning and data mining. How can I
> > contribute
> > > to the above project?
> > >
> > > Thank you,
> > > Rohit Shinde.
> >
> >
> >
> >
> >
> >
>

Re: Student looking to contribute to Stratosphere

Posted by Rohit Shinde <ro...@gmail.com>.

Hi,

Sorry for the brief hiatus. I was preparing for my GRE exam, but I am back.
I am starting to build Flink and a doubt which I had was, is a single-node
cluster configuration of Hadoop enough? I assume Hadoop is needed since it
is given on the build page.

On Sat, Jun 27, 2015 at 8:02 PM, Chiwan Park <ch...@apache.org> wrote:

> Hi, You can choose any unassigned issue about Flink Machine Learning
> Library (flink-ml) in JIRA. [1]
> There are some issues for starter in flink-ml such as FLINK-1737 [2],
> FLINK-1748 [3], FLINK-1994 [4].
>
> First, It would be better to read some articles about contributing to
> Flink. [5][6]
> And if you decide a issue to contribute, please assign it to you. If you
> don’t have permission to
> assign, just comment into the issue. Then other people give permission to
> you and assign
> the issue to you.
>
> Regards,
> Chiwan Park
>
> [1] https://issues.apache.org/jira/
> [2] https://issues.apache.org/jira/browse/FLINK-1737
> [3] https://issues.apache.org/jira/browse/FLINK-1748
> [4] https://issues.apache.org/jira/browse/FLINK-1994
> [5] http://flink.apache.org/how-to-contribute.html
> [6] http://flink.apache.org/coding-guidelines.html
>
> > On Jun 27, 2015, at 11:20 PM, Rohit Shinde <ro...@gmail.com>
> wrote:
> >
> > Hello everyone,
> >
> > I came across Stratosphere while looking for GSOC organisations working
> in
> > Machine Learning. I got to know that it had become Apache Flink.
> >
> > I am interested in this project:
> >
> https://github.com/stratosphere/stratosphere/wiki/Google-Summer-of-Code-2014#implement-one-or-multiple-machine-learning-algorithms-for-stratosphere
> >
> > Backgroundd: I am proficient in C++, Java, Python and Scheme. I have
> taken
> > undergrad courses in machine learning and data mining. How can I
> contribute
> > to the above project?
> >
> > Thank you,
> > Rohit Shinde.
>
>
>
>
>
>

Re: Student looking to contribute to Stratosphere

Posted by Chiwan Park <ch...@apache.org>.

Hi, You can choose any unassigned issue about Flink Machine Learning Library (flink-ml) in JIRA. [1]
There are some issues for starter in flink-ml such as FLINK-1737 [2], FLINK-1748 [3], FLINK-1994 [4].

First, It would be better to read some articles about contributing to Flink. [5][6]
And if you decide a issue to contribute, please assign it to you. If you don’t have permission to
assign, just comment into the issue. Then other people give permission to you and assign
the issue to you.

Regards,
Chiwan Park

[1] https://issues.apache.org/jira/
[2] https://issues.apache.org/jira/browse/FLINK-1737
[3] https://issues.apache.org/jira/browse/FLINK-1748
[4] https://issues.apache.org/jira/browse/FLINK-1994
[5] http://flink.apache.org/how-to-contribute.html
[6] http://flink.apache.org/coding-guidelines.html

> On Jun 27, 2015, at 11:20 PM, Rohit Shinde <ro...@gmail.com> wrote:
> 
> Hello everyone,
> 
> I came across Stratosphere while looking for GSOC organisations working in
> Machine Learning. I got to know that it had become Apache Flink.
> 
> I am interested in this project:
> https://github.com/stratosphere/stratosphere/wiki/Google-Summer-of-Code-2014#implement-one-or-multiple-machine-learning-algorithms-for-stratosphere
> 
> Backgroundd: I am proficient in C++, Java, Python and Scheme. I have taken
> undergrad courses in machine learning and data mining. How can I contribute
> to the above project?
> 
> Thank you,
> Rohit Shinde.