You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@metamodel.apache.org by kaspersorensen <gi...@git.apache.org> on 2018/11/23 16:40:18 UTC

[GitHub] metamodel pull request #194: METAMODEL-1205: JDK >8 support in core and exce...

GitHub user kaspersorensen opened a pull request:

    https://github.com/apache/metamodel/pull/194

    METAMODEL-1205: JDK >8 support in core and excel (with POI v 4.0.0)

    This is a step along the way to get to resolution of METAMODEL-1205 (supporting builds on JDKs > 8).
    
    The main change in this PR is that I upgraded Apache POI to version 4.0.0.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kaspersorensen/metamodel METAMODEL-1205-core-and-excel

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/metamodel/pull/194.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #194
    
----
commit e93f1e9070d9ad23f50143a83d5ee6083a3ab578
Author: Kasper Sørensen <i....@...>
Date:   2018-11-23T15:54:25Z

    METAMODEL-1205: Fixed assertion that depends on JDK's exception msg

commit e3b3b9ca240e462fa80c4ccf28175ce8e81f7c11
Author: Kasper Sørensen <i....@...>
Date:   2018-11-23T16:08:39Z

    METAMODEL-1205: Upgraded Apache POI to have JDK >8 support

commit faa0f88caa30563d1d46ddecd01e8819f3daffb6
Author: Kasper Sørensen <i....@...>
Date:   2018-11-23T16:38:18Z

    METAMODEL-1205: Fixed changes after POI upgraded to version 4.0.0
    
    I ran into an issue that reminded me of what we saw some time back [1]
    with writing to a file-based workbook failing with message "user-mapped
    section open". I found a reasonable workaround where workbooks are being
    loaded based on their InputStream rather than their File reference
    whenever we access the workbook in an update callback.
    
    [1] https://bz.apache.org/bugzilla/show_bug.cgi?id=58480

----


---

[GitHub] metamodel issue #194: METAMODEL-1205: JDK >8 support in core and excel (with...

Posted by kaspersorensen <gi...@git.apache.org>.
Github user kaspersorensen commented on the issue:

    https://github.com/apache/metamodel/pull/194
  
    No I think this is perfect. Thanks again for doing extra due diligence on the PR. I think we can merge ...
    
    Did you by any chance look at the two other JDK9+ related PRs?


---

[GitHub] metamodel issue #194: METAMODEL-1205: JDK >8 support in core and excel (with...

Posted by kaspersorensen <gi...@git.apache.org>.
Github user kaspersorensen commented on the issue:

    https://github.com/apache/metamodel/pull/194
  
    I like it! Thanks a bunch for testing it.


---

[GitHub] metamodel pull request #194: METAMODEL-1205: JDK >8 support in core and exce...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/metamodel/pull/194


---

[GitHub] metamodel issue #194: METAMODEL-1205: JDK >8 support in core and excel (with...

Posted by LosD <gi...@git.apache.org>.
Github user LosD commented on the issue:

    https://github.com/apache/metamodel/pull/194
  
    Okay, I'll give it a spin.
    
    But first, I'll hunt around for a large Excel file (or make one) :)


---

[GitHub] metamodel issue #194: METAMODEL-1205: JDK >8 support in core and excel (with...

Posted by LosD <gi...@git.apache.org>.
Github user LosD commented on the issue:

    https://github.com/apache/metamodel/pull/194
  
    Okay, made an even  stupider test, this time doing multiple inserts, from multiple threads (I think we had problems with that on Excel previously) :)
    
    ```java
    import java.io.File;
    import java.util.Collections;
    import java.util.List;
    import java.util.concurrent.Callable;
    import java.util.concurrent.ExecutionException;
    import java.util.concurrent.ExecutorService;
    import java.util.concurrent.Executors;
    
    import org.apache.metamodel.excel.ExcelDataContext;
    import org.apache.metamodel.schema.Column;
    import org.apache.metamodel.schema.Schema;
    import org.apache.metamodel.schema.Table;
    
    public class ExcelTest {
        private final ExcelDataContext dataContext;
    
        private ExcelTest(ExcelDataContext dataContext) {
            this.dataContext = dataContext;
        }
    
        private void go() throws InterruptedException {
            long startTime = System.currentTimeMillis();
            ExecutorService executor = Executors.newFixedThreadPool(8);
    
            List<Callable<Long>> workers = Collections.nCopies(16, this::addStuff);
    
            try {
                final long combinedTime = executor.invokeAll(workers).stream().mapToLong(future -> {
                    try {
                        return future.get();
                    } catch (InterruptedException | ExecutionException e) {
                        throw new IllegalStateException(e);
                    }
                }).sum();
                long endTime = System.currentTimeMillis();
                System.out.println(
                        "Combined execution time: " + combinedTime + "ms, real time: " + (endTime - startTime) + "ms");
            } finally {
                executor.shutdown();
            }
    
        }
    
        private long addStuff() {
            long startTime = System.currentTimeMillis();
    
            Schema schema = dataContext.getDefaultSchema();
            Table table = schema.getTable(0);
            Column column1 = table.getColumn(0);
            Column column2 = table.getColumn(1);
            Column column3 = table.getColumn(2);
    
            char[] chars = "abcdefghijklmnopqrstuvwxyz".toCharArray();
            dataContext.executeUpdate(updateCallback -> {
                for (int i = 0; i < 100; i++) {
                    updateCallback.insertInto(table).value(column1, i).value(column2, i * 10)
                            .value(column3, chars[i % chars.length]).execute();
                }
            });
    
            long endTime = System.currentTimeMillis();
    
            final long time = endTime - startTime;
            System.out.println("Thread '" + Thread.currentThread().getName() + "' execution time: " + time + "ms");
    
            return time;
        }
    
        public static void main(String[] args) throws InterruptedException {
            ExcelTest excelTest = new ExcelTest(new ExcelDataContext(new File("C:\\Users\\Dennis\\Documents\\empty.xlsx")));
            excelTest.go();
        }
    }
    ```
    
    The test runs fine, at least I saw no issues over 10-15 runs. It's obvious that workers are blocked while others are writing, but that's fair, as long as nothing goes wrong.
    
    XLSX is really slow when inserting:
    ```
    Thread 'pool-1-thread-7' execution time: 2203ms
    Thread 'pool-1-thread-2' execution time: 3333ms
    Thread 'pool-1-thread-6' execution time: 4424ms
    Thread 'pool-1-thread-3' execution time: 5528ms
    Thread 'pool-1-thread-8' execution time: 6629ms
    Thread 'pool-1-thread-5' execution time: 7697ms
    Thread 'pool-1-thread-1' execution time: 8776ms
    Thread 'pool-1-thread-4' execution time: 9880ms
    Thread 'pool-1-thread-1' execution time: 2246ms
    Thread 'pool-1-thread-5' execution time: 4444ms
    Thread 'pool-1-thread-8' execution time: 6621ms
    Thread 'pool-1-thread-3' execution time: 8854ms
    Thread 'pool-1-thread-6' execution time: 11074ms
    Thread 'pool-1-thread-2' execution time: 13257ms
    Thread 'pool-1-thread-7' execution time: 15490ms
    Thread 'pool-1-thread-4' execution time: 8893ms
    Combined execution time: 119349ms, real time: 18827ms
    ```
    (The combined numbers gets quite inflated by the waiting)
    
    XLS is quite a bit faster:
    ```
    Thread 'pool-1-thread-7' execution time: 244ms
    Thread 'pool-1-thread-2' execution time: 260ms
    Thread 'pool-1-thread-5' execution time: 274ms
    Thread 'pool-1-thread-8' execution time: 287ms
    Thread 'pool-1-thread-3' execution time: 303ms
    Thread 'pool-1-thread-1' execution time: 319ms
    Thread 'pool-1-thread-1' execution time: 16ms
    Thread 'pool-1-thread-4' execution time: 352ms
    Thread 'pool-1-thread-6' execution time: 369ms
    Thread 'pool-1-thread-4' execution time: 35ms
    Thread 'pool-1-thread-1' execution time: 68ms
    Thread 'pool-1-thread-3' execution time: 118ms
    Thread 'pool-1-thread-8' execution time: 154ms
    Thread 'pool-1-thread-5' execution time: 188ms
    Thread 'pool-1-thread-2' execution time: 224ms
    Thread 'pool-1-thread-7' execution time: 264ms
    Combined execution time: 3475ms, real time: 566ms
    ```
    
    XLSX on the master branch was maybe a little bit faster, but I didn't really run it often enough to make sure:
    ```
    Thread 'pool-1-thread-3' execution time: 1779ms
    Thread 'pool-1-thread-1' execution time: 2903ms
    Thread 'pool-1-thread-4' execution time: 4040ms
    Thread 'pool-1-thread-2' execution time: 5155ms
    Thread 'pool-1-thread-8' execution time: 6249ms
    Thread 'pool-1-thread-5' execution time: 7354ms
    Thread 'pool-1-thread-6' execution time: 8440ms
    Thread 'pool-1-thread-7' execution time: 9552ms
    Thread 'pool-1-thread-6' execution time: 2214ms
    Thread 'pool-1-thread-5' execution time: 4377ms
    Thread 'pool-1-thread-8' execution time: 6600ms
    Thread 'pool-1-thread-2' execution time: 8795ms
    Thread 'pool-1-thread-4' execution time: 11014ms
    Thread 'pool-1-thread-1' execution time: 13242ms
    Thread 'pool-1-thread-3' execution time: 15466ms
    Thread 'pool-1-thread-7' execution time: 8781ms
    Combined execution time: 115961ms, real time: 18391ms
    ```
    
    XLS was very slow compared to the new branch, though:
    ```
    Thread 'pool-1-thread-3' execution time: 1061ms
    Thread 'pool-1-thread-4' execution time: 1896ms
    Thread 'pool-1-thread-7' execution time: 2743ms
    Thread 'pool-1-thread-1' execution time: 3582ms
    Thread 'pool-1-thread-5' execution time: 4438ms
    Thread 'pool-1-thread-8' execution time: 5276ms
    Thread 'pool-1-thread-6' execution time: 6170ms
    Thread 'pool-1-thread-2' execution time: 7016ms
    Thread 'pool-1-thread-6' execution time: 1682ms
    Thread 'pool-1-thread-8' execution time: 3432ms
    Thread 'pool-1-thread-5' execution time: 5119ms
    Thread 'pool-1-thread-1' execution time: 6836ms
    Thread 'pool-1-thread-7' execution time: 8505ms
    Thread 'pool-1-thread-4' execution time: 10180ms
    Thread 'pool-1-thread-3' execution time: 11874ms
    Thread 'pool-1-thread-2' execution time: 6791ms
    Combined execution time: 86601ms, real time: 13865ms
    ```
    
    To make sure, I checked the Excel documents, and everything seems correct (here the blocking is also obvious, as values are progressing from low to high values, with no intertwined values from other threads).
    
    I think the conclusion must be that in general it works as it should, and with quite a good boost on XLS speed, _maybe_ at the cost of a little bit of writing speed on the XLXS side.


---

[GitHub] metamodel issue #194: METAMODEL-1205: JDK >8 support in core and excel (with...

Posted by LosD <gi...@git.apache.org>.
Github user LosD commented on the issue:

    https://github.com/apache/metamodel/pull/194
  
    D'oh, of course they're blocking, I'm doing the loop inside an update script! Let me know if you want me to change it. :-)


---

[GitHub] metamodel issue #194: METAMODEL-1205: JDK >8 support in core and excel (with...

Posted by LosD <gi...@git.apache.org>.
Github user LosD commented on the issue:

    https://github.com/apache/metamodel/pull/194
  
    Yeah, I agree, everything looks good.
    
    Nope, not really. I'll take a look in a moment. 


---

[GitHub] metamodel issue #194: METAMODEL-1205: JDK >8 support in core and excel (with...

Posted by kaspersorensen <gi...@git.apache.org>.
Github user kaspersorensen commented on the issue:

    https://github.com/apache/metamodel/pull/194
  
    I think our tests should cover most of the stuff, but the one thing that I will admit was somewhat complicated was what I noted in the third commit:
    
    > I ran into an issue that reminded me of what we saw some time back [1]
    > with writing to a file-based workbook failing with message "user-mapped
    > section open". I found a reasonable workaround where workbooks are being
    > loaded based on their InputStream rather than their File reference
    > whenever we access the workbook in an update callback.
    > 
    > [1] https://bz.apache.org/bugzilla/show_bug.cgi?id=58480
    
    Or in terms of testing ... It would be interesting to compare performance and ability to do multiple inserts into the same sheet. I have a small fear that this new approach is slower because it is instantiating the Workbook based on `InputStream` instead of `File`. But OTOH I don't know of an alternative, because the `File` based thing is creating file system locks that prevent those inserts from happening without having to do very silly sleeps or something like that.


---

[GitHub] metamodel issue #194: METAMODEL-1205: JDK >8 support in core and excel (with...

Posted by LosD <gi...@git.apache.org>.
Github user LosD commented on the issue:

    https://github.com/apache/metamodel/pull/194
  
    LGTM, but I haven't tested yet. Guess such a big jump should have some kind of manual test. Anything I should look specifically at trying out?


---

[GitHub] metamodel issue #194: METAMODEL-1205: JDK >8 support in core and excel (with...

Posted by LosD <gi...@git.apache.org>.
Github user LosD commented on the issue:

    https://github.com/apache/metamodel/pull/194
  
    Made a (bit stupid) test:
    ```Java
    import org.apache.metamodel.data.DataSet;
    import org.apache.metamodel.excel.ExcelDataContext;
    import org.apache.metamodel.schema.Column;
    import org.apache.metamodel.schema.Schema;
    import org.apache.metamodel.schema.Table;
    
    import java.io.File;
    
    public class Main {
        public static void main(String[] args) {
            long startTime = System.currentTimeMillis();
    
            ExcelDataContext excelDataContext = new ExcelDataContext(new File("C:\\Users\\Dennis\\Documents\\testlarge.xlsx"));
            Schema schema = excelDataContext.getDefaultSchema();
            Table table = schema.getTable(0);
            Column column = table.getColumn(0);
            Column column2 = table.getColumn(1);
    
    
            DataSet ds = excelDataContext.query().from(table).select(column).where(column2).gt(2).execute();
    
            long rows = 0;
            while (ds.next()) {
                rows++;
                if("NOWAY".equals(ds.getRow().getValue(0))) {
                    throw new RuntimeException("NO WAY!");
                }
            }
            long endTime = System.currentTimeMillis();
            System.out.println("Total execution time: " + (endTime-startTime) + "ms, for " + rows + " rows");
    
        }
    }
    ```
    (Applied the `where` to give MM a little bit of calculating work, and did the NOWAY test just to ensure that Java doesn't optimize the loop out. It's been too long for me to remember a better way)
    
    The source sheet is 2 columns of numbers, one of text, and 610560 rows of it on XLSX, 65536 on XLS (maximum). I newer really did anything with the third column, but I guess we're mostly testing data loading anyway.
    
    It looks like improvements for XLS, and a toss-up for XLSX to me (scale is milliseconds). I could average to get a winner there, but I'm pretty sure there's not enough numbers anyway. Though it looks like the old one is maybe a bit more peaky (also noticed that when I, errrrr, tested my test):
    ![image](https://user-images.githubusercontent.com/1282832/48960652-9df01c80-ef6e-11e8-9894-4ea505f13493.png)



---

[GitHub] metamodel issue #194: METAMODEL-1205: JDK >8 support in core and excel (with...

Posted by LosD <gi...@git.apache.org>.
Github user LosD commented on the issue:

    https://github.com/apache/metamodel/pull/194
  
    D'oh! Forgot about the multiple inserts.
    
    It's getting a bit late here, but I'll try those out Sunday (family visit tomorrow).


---