You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Stuart Horsman <st...@gmail.com> on 2014/11/01 01:27:56 UTC

Re: SparkContext UI

Hi Sean/Sameer,

It seems you're both right.  In the python shell I need to explicitly call
the empty parens data.cache(), then run an action and it appears in the
storage tab.  Using the scala shell I can just call data.cache without the
parens, run an action tthat works.

Thanks for your help.

Stu

On 31 October 2014 19:19, Sean Owen <so...@cloudera.com> wrote:

> No, empty parens do no matter when calling no-arg methods in Scala.
> This invocation should work as-is and should result in the RDD showing
> in Storage. I see that when I run it right now.
>
> Since it really does/should work, I'd look at other possibilities --
> is it maybe taking a short time to start caching? looking at a
> different/old Storage tab?
>
> On Fri, Oct 31, 2014 at 1:17 AM, Sameer Farooqui <sa...@databricks.com>
> wrote:
> > Hi Stuart,
> >
> > You're close!
> >
> > Just add a () after the cache, like: data.cache()
> >
> > ...and then run the .count() action on it and you should be good to see
> it
> > in the Storage UI!
> >
> >
> > - Sameer
> >
> > On Thu, Oct 30, 2014 at 4:50 PM, Stuart Horsman <
> stuart.horsman@gmail.com>
> > wrote:
> >>
> >> Sorry too quick to pull the trigger on my original email.  I should have
> >> added that I'm tried using persist() and cache() but no joy.
> >>
> >> I'm doing this:
> >>
> >> data = sc.textFile("somedata")
> >>
> >> data.cache
> >>
> >> data.count()
> >>
> >> but I still can't see anything in the storage?
> >>
> >>
> >>
> >> On 31 October 2014 10:42, Sameer Farooqui <sa...@databricks.com>
> wrote:
> >>>
> >>> Hey Stuart,
> >>>
> >>> The RDD won't show up under the Storage tab in the UI until it's been
> >>> cached. Basically Spark doesn't know what the RDD will look like until
> it's
> >>> cached, b/c up until then the RDD is just on disk (external to Spark).
> If
> >>> you launch some transformations + an action on an RDD that is purely on
> >>> disk, then Spark will read it from disk, compute against it and then
> write
> >>> the results back to disk or show you the results at the scala/python
> shells.
> >>> But when you run Spark workloads against purely on disk files, the RDD
> won't
> >>> show up in Spark's Storage UI. Hope that makes sense...
> >>>
> >>> - Sameer
> >>>
> >>> On Thu, Oct 30, 2014 at 4:30 PM, Stuart Horsman
> >>> <st...@gmail.com> wrote:
> >>>>
> >>>> Hi All,
> >>>>
> >>>> When I load an RDD with:
> >>>>
> >>>> data = sc.textFile("somefile")
> >>>>
> >>>> I don't see the resulting RDD in the SparkContext gui on
> localhost:4040
> >>>> in /storage.
> >>>>
> >>>> Is there something special I need to do to allow me to view this?  I
> >>>> tried but scala and python shells but same result.
> >>>>
> >>>> Thanks
> >>>>
> >>>> Stuart
> >>>
> >>>
> >>
> >
>