You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@zeppelin.apache.org by karup1990 <gi...@git.apache.org> on 2016/02/04 16:23:40 UTC

[GitHub] incubator-zeppelin pull request: ZEPPELiN-654 Improvement to Spark...

GitHub user karup1990 opened a pull request:

    https://github.com/apache/incubator-zeppelin/pull/695

    ZEPPELiN-654 Improvement to SparkSqlInterpreter

    ### What is this PR for?
    Improve performance of Sparksqlinterpreter
    Use StringBuilder instead of String when building the results returned by SparksqlInterpreter 
    
    ### What type of PR is it?
    Improvement
    
    ### Todos
    NA
    
    ### Is there a relevant Jira issue?
    
    ### How should this be tested?
    Try to run a Sparksql command that returns large number of rows(make sure `zeppelin.spark.maxResult` is set to a larger value) with and without fix.
    Keep note of the time taken in each case.
    The time taken with this change  is significantly lesser.
    
    ### Screenshots (if appropriate) 
    NA
    
    ### Questions:
    * Does the licenses files need update?  No
    * Is there breaking changes for older versions? No
    * Does this needs documentation? No


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/karup1990/incubator-zeppelin sqlc-imp

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-zeppelin/pull/695.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #695
    
----
commit 11789e832ccc2c1fff8f34afa35ea13661f80eac
Author: karuppayya <ka...@gmail.com>
Date:   2016-02-04T13:08:46Z

    use stringbuilder instead of string

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: ZEPPELiN-654 Improvement to Spark...

Posted by karup1990 <gi...@git.apache.org>.

Github user karup1990 commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/695#issuecomment-179935889
  
    When SparkSqlinterpreter returns the result , ZeppelinContext constructs a String out of it for displaying on UI.
    The logic to construct result string uses String Class for each column value in each row. 
    If result has x rows and each row has y columns, it creates x * y String objects for constructing the result string. 
    For large values of x and y, the number of String objects created is enormous and is time consuming.
    Instead we could use StringBuilder, which will reduce number of objects created and will be faster.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: ZEPPELiN-654 Improvement to Spark...

Posted by Leemoonsoo <gi...@git.apache.org>.

Github user Leemoonsoo commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/695#issuecomment-180702685
  
    Merge if there're no more discussions


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: ZEPPELiN-654 Improvement to Spark...

Posted by karup1990 <gi...@git.apache.org>.

Github user karup1990 commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/695#issuecomment-179905667
  
    Removing some unused imports in `ZeppelinContext.java` in this PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: ZEPPELiN-654 Improvement to Spark...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-zeppelin/pull/695


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: ZEPPELiN-654 Improvement to Spark...

Posted by corneadoug <gi...@git.apache.org>.

Github user corneadoug commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/695#issuecomment-179937643
  
    @karup1990 Could you provide a table showing multiple time differences depending on results size?
    That would be quite useful for us to review without having to run every cases possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: ZEPPELiN-654 Improvement to Spark...

Posted by Leemoonsoo <gi...@git.apache.org>.

Github user Leemoonsoo commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/695#issuecomment-180208608
  
    I have tested following code
    
    ```
    %spark
    case class Test(n:Int, s:String)
    val data = sc.parallelize(1 to 10000).map(i=>Test(i, "aaaaaaaaaaaabbbbbbbbbbbbcccccccccccddddddddd")).toDF
    data.registerTempTable("test")
    ```
    
    and
    
    ```
    %sql
    select * from test
    ```
    
    took 20 sec before this patch but only 1 sec after this patch in my machine.
    significant performance improvement.
    
    It looks good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: ZEPPELiN-654 Improvement to Spark...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/695#issuecomment-180837044
  
    looks good


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---