You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Ivan Sadikov <iv...@gmail.com> on 2017/07/18 07:56:52 UTC

Spark history server running on Mongo

Hello everyone!

I have been working on Spark history server that uses MongoDB as a
datastore for processed events to iterate on idea that Spree project uses
for Spark UI. Project was originally designed to improve on standalone
history server with reduced memory footprint.

Project lives here: https://github.com/lightcopy/history-server

These are just very early days of the project, sort of pre-alpha (some
features are missing, and metrics in some failed jobs cases are
questionable). Code is being tested on several 8gb and 2gb logs and aims to
lower resource usage since we run history server together with several
other systems.

Would greatly appreciate any feedback on repository (issues/pull
requests/suggestions/etc.). Thanks a lot!


Cheers,

Ivan

Re: Spark history server running on Mongo

Posted by Ivan Sadikov <iv...@gmail.com>.

Yes, you are absolutely right, though UI does not change often, and it
potentially allows to iterate faster, IMHO, which is why started working on
this. For me, it felt like this functionality could easily be outsourced to
a separate project.

And, as you pointed out, I did add some small fixes to UI, but still have
fair amount of them to add. It is all mentioned in repo readme, by the way.

Thanks for clearing things out.
Have a good day!

Ivan

On Thu, 20 Jul 2017 at 5:52 AM, Marcelo Vanzin <va...@cloudera.com> wrote:

> On Tue, Jul 18, 2017 at 7:21 PM, Ivan Sadikov <iv...@gmail.com>
> wrote:
> > Repository that I linked to does not require rebuilding Spark and could
> be
> > used with current distribution, which is preferable in my case.
>
> Fair enough, although that means that you're re-implementing the Spark
> UI, which makes that project have to constantly be modified to keep up
> with UI changes in Spark (or create its own UI and forget about what
> Spark does). Which is what Spree does too.
>
> In the long term I believe having these sort of enhancements in Spark
> itself would benefit more people.
>
> --
> Marcelo
>

Re: Spark history server running on Mongo

Posted by Ivan Sadikov <iv...@gmail.com>.

Yes, you are absolutely right, though UI does not change often, and it
potentially allows to iterate faster, IMHO, which is why started working on
this. For me, it felt like this functionality could easily be outsourced to
a separate project.

And, as you pointed out, I did add some small fixes to UI, but still have
fair amount of them to add. It is all mentioned in repo readme, by the way.

Thanks for clearing things out.
Have a good day!

Ivan

On Thu, 20 Jul 2017 at 5:52 AM, Marcelo Vanzin <va...@cloudera.com> wrote:

> On Tue, Jul 18, 2017 at 7:21 PM, Ivan Sadikov <iv...@gmail.com>
> wrote:
> > Repository that I linked to does not require rebuilding Spark and could
> be
> > used with current distribution, which is preferable in my case.
>
> Fair enough, although that means that you're re-implementing the Spark
> UI, which makes that project have to constantly be modified to keep up
> with UI changes in Spark (or create its own UI and forget about what
> Spark does). Which is what Spree does too.
>
> In the long term I believe having these sort of enhancements in Spark
> itself would benefit more people.
>
> --
> Marcelo
>

Re: Spark history server running on Mongo

Posted by Marcelo Vanzin <va...@cloudera.com>.

On Tue, Jul 18, 2017 at 7:21 PM, Ivan Sadikov <iv...@gmail.com> wrote:
> Repository that I linked to does not require rebuilding Spark and could be
> used with current distribution, which is preferable in my case.

Fair enough, although that means that you're re-implementing the Spark
UI, which makes that project have to constantly be modified to keep up
with UI changes in Spark (or create its own UI and forget about what
Spark does). Which is what Spree does too.

In the long term I believe having these sort of enhancements in Spark
itself would benefit more people.

-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Spark history server running on Mongo

Posted by Marcelo Vanzin <va...@cloudera.com>.

On Tue, Jul 18, 2017 at 7:21 PM, Ivan Sadikov <iv...@gmail.com> wrote:
> Repository that I linked to does not require rebuilding Spark and could be
> used with current distribution, which is preferable in my case.

Fair enough, although that means that you're re-implementing the Spark
UI, which makes that project have to constantly be modified to keep up
with UI changes in Spark (or create its own UI and forget about what
Spark does). Which is what Spree does too.

In the long term I believe having these sort of enhancements in Spark
itself would benefit more people.

-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: Spark history server running on Mongo

Posted by Ivan Sadikov <iv...@gmail.com>.

Hi Marcelo,

Thanks for the reference, again. I looked at your code - really great work!
I had to replace Spark distribution to use it though - could not figure out
how to build it separately.

Repository that I linked to does not require rebuilding Spark and could be
used with current distribution, which is preferable in my case.


Kind regards,

Ivan



On Wed, 19 Jul 2017 at 4:44 AM, Ivan Sadikov <iv...@gmail.com> wrote:

> Thanks for JIRA ticket reference! Frankly, I was aware of this work, but
> didn't know that there was an API for storage implementation.
>
> Will try exploring that as well, thanks!
> On Wed, 19 Jul 2017 at 4:18 AM, Marcelo Vanzin <va...@cloudera.com>
> wrote:
>
>> See SPARK-18085. That has much of the same goals re: SHS resource
>> usage, and also provides a (currently non-public) API where you could
>> just create a MongoDB implementation if you want.
>>
>> On Tue, Jul 18, 2017 at 12:56 AM, Ivan Sadikov <iv...@gmail.com>
>> wrote:
>> > Hello everyone!
>> >
>> > I have been working on Spark history server that uses MongoDB as a
>> datastore
>> > for processed events to iterate on idea that Spree project uses for
>> Spark
>> > UI. Project was originally designed to improve on standalone history
>> server
>> > with reduced memory footprint.
>> >
>> > Project lives here: https://github.com/lightcopy/history-server
>> >
>> > These are just very early days of the project, sort of pre-alpha (some
>> > features are missing, and metrics in some failed jobs cases are
>> > questionable). Code is being tested on several 8gb and 2gb logs and
>> aims to
>> > lower resource usage since we run history server together with several
>> other
>> > systems.
>> >
>> > Would greatly appreciate any feedback on repository (issues/pull
>> > requests/suggestions/etc.). Thanks a lot!
>> >
>> >
>> > Cheers,
>> >
>> > Ivan
>> >
>>
>>
>>
>> --
>> Marcelo
>>
>

Re: Spark history server running on Mongo

Posted by Ivan Sadikov <iv...@gmail.com>.

Hi Marcelo,

Thanks for the reference, again. I looked at your code - really great work!
I had to replace Spark distribution to use it though - could not figure out
how to build it separately.

Repository that I linked to does not require rebuilding Spark and could be
used with current distribution, which is preferable in my case.


Kind regards,

Ivan



On Wed, 19 Jul 2017 at 4:44 AM, Ivan Sadikov <iv...@gmail.com> wrote:

> Thanks for JIRA ticket reference! Frankly, I was aware of this work, but
> didn't know that there was an API for storage implementation.
>
> Will try exploring that as well, thanks!
> On Wed, 19 Jul 2017 at 4:18 AM, Marcelo Vanzin <va...@cloudera.com>
> wrote:
>
>> See SPARK-18085. That has much of the same goals re: SHS resource
>> usage, and also provides a (currently non-public) API where you could
>> just create a MongoDB implementation if you want.
>>
>> On Tue, Jul 18, 2017 at 12:56 AM, Ivan Sadikov <iv...@gmail.com>
>> wrote:
>> > Hello everyone!
>> >
>> > I have been working on Spark history server that uses MongoDB as a
>> datastore
>> > for processed events to iterate on idea that Spree project uses for
>> Spark
>> > UI. Project was originally designed to improve on standalone history
>> server
>> > with reduced memory footprint.
>> >
>> > Project lives here: https://github.com/lightcopy/history-server
>> >
>> > These are just very early days of the project, sort of pre-alpha (some
>> > features are missing, and metrics in some failed jobs cases are
>> > questionable). Code is being tested on several 8gb and 2gb logs and
>> aims to
>> > lower resource usage since we run history server together with several
>> other
>> > systems.
>> >
>> > Would greatly appreciate any feedback on repository (issues/pull
>> > requests/suggestions/etc.). Thanks a lot!
>> >
>> >
>> > Cheers,
>> >
>> > Ivan
>> >
>>
>>
>>
>> --
>> Marcelo
>>
>

Re: Spark history server running on Mongo

Posted by Ivan Sadikov <iv...@gmail.com>.

Thanks for JIRA ticket reference! Frankly, I was aware of this work, but
didn't know that there was an API for storage implementation.

Will try exploring that as well, thanks!
On Wed, 19 Jul 2017 at 4:18 AM, Marcelo Vanzin <va...@cloudera.com> wrote:

> See SPARK-18085. That has much of the same goals re: SHS resource
> usage, and also provides a (currently non-public) API where you could
> just create a MongoDB implementation if you want.
>
> On Tue, Jul 18, 2017 at 12:56 AM, Ivan Sadikov <iv...@gmail.com>
> wrote:
> > Hello everyone!
> >
> > I have been working on Spark history server that uses MongoDB as a
> datastore
> > for processed events to iterate on idea that Spree project uses for Spark
> > UI. Project was originally designed to improve on standalone history
> server
> > with reduced memory footprint.
> >
> > Project lives here: https://github.com/lightcopy/history-server
> >
> > These are just very early days of the project, sort of pre-alpha (some
> > features are missing, and metrics in some failed jobs cases are
> > questionable). Code is being tested on several 8gb and 2gb logs and aims
> to
> > lower resource usage since we run history server together with several
> other
> > systems.
> >
> > Would greatly appreciate any feedback on repository (issues/pull
> > requests/suggestions/etc.). Thanks a lot!
> >
> >
> > Cheers,
> >
> > Ivan
> >
>
>
>
> --
> Marcelo
>

Re: Spark history server running on Mongo

Posted by Ivan Sadikov <iv...@gmail.com>.

Thanks for JIRA ticket reference! Frankly, I was aware of this work, but
didn't know that there was an API for storage implementation.

Will try exploring that as well, thanks!
On Wed, 19 Jul 2017 at 4:18 AM, Marcelo Vanzin <va...@cloudera.com> wrote:

> See SPARK-18085. That has much of the same goals re: SHS resource
> usage, and also provides a (currently non-public) API where you could
> just create a MongoDB implementation if you want.
>
> On Tue, Jul 18, 2017 at 12:56 AM, Ivan Sadikov <iv...@gmail.com>
> wrote:
> > Hello everyone!
> >
> > I have been working on Spark history server that uses MongoDB as a
> datastore
> > for processed events to iterate on idea that Spree project uses for Spark
> > UI. Project was originally designed to improve on standalone history
> server
> > with reduced memory footprint.
> >
> > Project lives here: https://github.com/lightcopy/history-server
> >
> > These are just very early days of the project, sort of pre-alpha (some
> > features are missing, and metrics in some failed jobs cases are
> > questionable). Code is being tested on several 8gb and 2gb logs and aims
> to
> > lower resource usage since we run history server together with several
> other
> > systems.
> >
> > Would greatly appreciate any feedback on repository (issues/pull
> > requests/suggestions/etc.). Thanks a lot!
> >
> >
> > Cheers,
> >
> > Ivan
> >
>
>
>
> --
> Marcelo
>

Re: Spark history server running on Mongo

Posted by Marcelo Vanzin <va...@cloudera.com>.

See SPARK-18085. That has much of the same goals re: SHS resource
usage, and also provides a (currently non-public) API where you could
just create a MongoDB implementation if you want.

On Tue, Jul 18, 2017 at 12:56 AM, Ivan Sadikov <iv...@gmail.com> wrote:
> Hello everyone!
>
> I have been working on Spark history server that uses MongoDB as a datastore
> for processed events to iterate on idea that Spree project uses for Spark
> UI. Project was originally designed to improve on standalone history server
> with reduced memory footprint.
>
> Project lives here: https://github.com/lightcopy/history-server
>
> These are just very early days of the project, sort of pre-alpha (some
> features are missing, and metrics in some failed jobs cases are
> questionable). Code is being tested on several 8gb and 2gb logs and aims to
> lower resource usage since we run history server together with several other
> systems.
>
> Would greatly appreciate any feedback on repository (issues/pull
> requests/suggestions/etc.). Thanks a lot!
>
>
> Cheers,
>
> Ivan
>



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: Spark history server running on Mongo

Posted by Marcelo Vanzin <va...@cloudera.com>.

See SPARK-18085. That has much of the same goals re: SHS resource
usage, and also provides a (currently non-public) API where you could
just create a MongoDB implementation if you want.

On Tue, Jul 18, 2017 at 12:56 AM, Ivan Sadikov <iv...@gmail.com> wrote:
> Hello everyone!
>
> I have been working on Spark history server that uses MongoDB as a datastore
> for processed events to iterate on idea that Spree project uses for Spark
> UI. Project was originally designed to improve on standalone history server
> with reduced memory footprint.
>
> Project lives here: https://github.com/lightcopy/history-server
>
> These are just very early days of the project, sort of pre-alpha (some
> features are missing, and metrics in some failed jobs cases are
> questionable). Code is being tested on several 8gb and 2gb logs and aims to
> lower resource usage since we run history server together with several other
> systems.
>
> Would greatly appreciate any feedback on repository (issues/pull
> requests/suggestions/etc.). Thanks a lot!
>
>
> Cheers,
>
> Ivan
>



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org