You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by carlosfuertes <gi...@git.apache.org> on 2014/09/07 17:56:38 UTC

[GitHub] spark pull request: [SPARK-2017] [SPARK-2016] Web UI responsivenes...

Github user carlosfuertes commented on the pull request:

    https://github.com/apache/spark/pull/1682#issuecomment-54750881
  
    Hi,
    
    I have updated the title of the pull request and make sure it is mergable
    after latest master updates. Last I have dropped the usage of a custom
    table css since as I explained in
    https://issues.apache.org/jira/browse/SPARK-2017 it is not reallly the
    bottleneck and it may simplify things at first (that's a later tweak).
    
    The reason I added the env variable ""spark.ui.jsRenderingEnabled" and
    retain html server rendering in this PR, is to ensure folks that rely on
    not having to use javascript can still operate. They would just to launch
    spark with "spark.ui.jsRenderingEnabled" to false.
    
    Later, anybody that relies on not using javascript, should access the info
    using the JSON interface. But till people are using the JSON interface, we
    should still have the current minimal html form.
    
    Right now I see this Pull Request as a working proof of concept of what the
    JSON interface and javascript can look like. There are still some points to
    discuss and agree among everybody:
    
    1) What is the JSON format that we want?
         Current JSON is very verbose, in the sense that is very inefficient
    since it sends everything as key value for every line: It repeats
    field unnecessarily
    
    [ {
      "Index" : 0,
      "ID" : 0,
      "Attempt" : {
        "value" : "0",
        "sorttable_customkey" : "0"
      },
      "Status" : "SUCCESS",
      "Locality Level" : "PROCESS_LOCAL",
      "Executor" : "localhost",
      "Launch Time" : "2014/09/07 15:25:24",
      "Duration" : {
        "value" : "0.8 s",
        "sorttable_customkey" : "780"
      },
      "GC Time" : "",
      "Errors" : ""
    }, ...
    
    We could use a much compact format where the first line are the names
    of the fields and every new line is just an array of the values of the
    fields (no repetition of keys).
    
    Also we need to include tests for the JSON interface.
    
    
    2) If we want to deal and render really big tables, I think we should
    include pagination and update the web UI with it.
    
    In the JSON interface, we should include a parameter that tells you
    how many rows to return. Something like ex.
    "/storage/rdd/blocks/json/?id=0&nrows=1000" and if you want to get
    everything say it explicitly, ex.
    "/storage/rdd/blocks/json/?id=0&nrows=all"
    
    
    
    3) The current method to sort the output in the browser using the
    "sorttable" js package does not work for large tables. It is too slow.
    
    When we request the data, the server should do the sorting.
    
    That is, the JSON api should receive a parameter telling the server
    which column should be used to do the sorting: something like ex.
    "/storage/rdd/blocks/json/?id=0&sortby=0"
    
    
    Let me know what you think about the points above.
    
    
    
    
    
    
    
    On Sat, Sep 6, 2014 at 3:44 PM, Josh Rosen <no...@github.com> wrote:
    
    > @ash211 <https://github.com/ash211> This weekend, I'm actually working on
    > writing a design document for web UI improvements in Spark 1.2. SSL
    > encryption, authentication, and ACLs are all features that I'm planning to
    > put on the roadmap.
    >
    > Do you have SSH to your EC2 machines? One option is to use a SSH proxy to
    > view the full web UI in your browser. Once you've set up the proxy, you can
    > use a browser plugin like FoxyProxy <http://getfoxyproxy.org/> to
    > seamlessly proxy requests for the UI.
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/1682#issuecomment-54726062>.
    >


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org