You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@metamodel.apache.org by tomatophantastico <gi...@git.apache.org> on 2017/07/21 22:07:28 UTC

[GitHub] metamodel pull request #148: Feature/faster join

GitHub user tomatophantastico opened a pull request:

    https://github.com/apache/metamodel/pull/148

    Feature/faster join

    A simple nested loop join implementation to deal with METAMODEL-1144.
    
    This is an okay-ish solution:
    it maintains the results of the joins, but wont scale too well
    no join ordering or similar

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tomatophantastico/metamodel feature/fasterJoin

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/metamodel/pull/148.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #148
    
----
commit 65574d345bec251428868b24dc0f9322b3c93e14
Author: Jörg Unbehauen <jo...@unbehauen.net>
Date:   2017-07-20T15:01:02Z

    guava is now available in core

commit ee2b91671d8cb6b35046aabba8426724831b7205
Author: Jörg Unbehauen <jo...@unbehauen.net>
Date:   2016-05-03T10:34:03Z

    Simple nested loop join implementation

commit ef5ac06f17937365603e2be2eb1333c9d1407062
Author: Jörg Unbehauen <jo...@unbehauen.net>
Date:   2017-07-18T13:28:30Z

    Test cases for the join implementation

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request #148: Feature/faster join

Posted by tomatophantastico <gi...@git.apache.org>.
Github user tomatophantastico commented on a diff in the pull request:

    https://github.com/apache/metamodel/pull/148#discussion_r129790017
  
    --- Diff: core/src/test/java/org/apache/metamodel/MetaModelHelperTest.java ---
    @@ -115,21 +115,16 @@ public void testRightJoin() throws Exception {
     
         public void testSimpleCarthesianProduct() throws Exception {
             DataSet dataSet = MetaModelHelper.getCarthesianProduct(createDataSet1(), createDataSet2());
    -
    +        List<String> results = new ArrayList<String>();
    +        
    +        while(dataSet.next()){
    +          results.add(dataSet.getRow().toString());
    +        }
             assertEquals(2, dataSet.getSelectItems().length);
    -        assertTrue(dataSet.next());
    -        assertEquals("Row[values=[f, b]]", dataSet.getRow().toString());
    -        assertTrue(dataSet.next());
    -        assertEquals("Row[values=[f, a]]", dataSet.getRow().toString());
    -        assertTrue(dataSet.next());
    -        assertTrue(dataSet.next());
    -        assertTrue(dataSet.next());
    -        assertTrue(dataSet.next());
    -        assertTrue(dataSet.next());
    -        assertTrue(dataSet.next());
    -        assertTrue(dataSet.next());
    -        assertEquals("Row[values=[o, r]]", dataSet.getRow().toString());
    -        assertFalse(dataSet.next());
    +        assertEquals(9, results.size());
    +        assertTrue(results.contains("Row[values=[f, b]]"));
    +        assertTrue(results.contains("Row[values=[f, a]]"));
    +        assertTrue(results.contains("Row[values=[o, r]]"));
    --- End diff --
    
    true.
    
    There are a lot of Arrays.toString() comparisons in other test, which should not be there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel issue #148: Feature/faster join

Posted by tomatophantastico <gi...@git.apache.org>.
Github user tomatophantastico commented on the issue:

    https://github.com/apache/metamodel/pull/148
  
    I do not think the changes i proposed about a year ago were merged, as they broke some tests. Or did i miss sth here? 
    
    The proposed change is still ~10 times faster, depending on the dataset size.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel issue #148: Feature/faster join

Posted by LosD <gi...@git.apache.org>.
Github user LosD commented on the issue:

    https://github.com/apache/metamodel/pull/148
  
    My 2 cents regarding helpers (without having looked at the actual code): While any kind of utility or helper class is an anti pattern, if the JoinHelper centers around the same subject and MetaModelHelper does not, it's much preferable to keep them separate. IMHO, we'd want to remove methods from MetaModelHelper (to the point it ceases to be at some point), not add to it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel issue #148: Feature/faster join

Posted by kaspersorensen <gi...@git.apache.org>.
Github user kaspersorensen commented on the issue:

    https://github.com/apache/metamodel/pull/148
  
    The point about using collections instead of arrays is a good old wishlist item :-)
    https://issues.apache.org/jira/browse/METAMODEL-7
    
    If you're up for it, I am 100% behind it. But it's a LOT of work I think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request #148: Feature/faster join

Posted by kaspersorensen <gi...@git.apache.org>.
Github user kaspersorensen commented on a diff in the pull request:

    https://github.com/apache/metamodel/pull/148#discussion_r129737407
  
    --- Diff: core/src/test/java/org/apache/metamodel/MetaModelHelperTest.java ---
    @@ -115,21 +115,16 @@ public void testRightJoin() throws Exception {
     
         public void testSimpleCarthesianProduct() throws Exception {
             DataSet dataSet = MetaModelHelper.getCarthesianProduct(createDataSet1(), createDataSet2());
    -
    +        List<String> results = new ArrayList<String>();
    +        
    +        while(dataSet.next()){
    +          results.add(dataSet.getRow().toString());
    +        }
             assertEquals(2, dataSet.getSelectItems().length);
    -        assertTrue(dataSet.next());
    -        assertEquals("Row[values=[f, b]]", dataSet.getRow().toString());
    -        assertTrue(dataSet.next());
    -        assertEquals("Row[values=[f, a]]", dataSet.getRow().toString());
    -        assertTrue(dataSet.next());
    -        assertTrue(dataSet.next());
    -        assertTrue(dataSet.next());
    -        assertTrue(dataSet.next());
    -        assertTrue(dataSet.next());
    -        assertTrue(dataSet.next());
    -        assertTrue(dataSet.next());
    -        assertEquals("Row[values=[o, r]]", dataSet.getRow().toString());
    -        assertFalse(dataSet.next());
    +        assertEquals(9, results.size());
    +        assertTrue(results.contains("Row[values=[f, b]]"));
    +        assertTrue(results.contains("Row[values=[f, a]]"));
    +        assertTrue(results.contains("Row[values=[o, r]]"));
    --- End diff --
    
    Now that we're not asserting on order anymore, we should probably assert on the existance of all 9 records. Otherwise we've somehow easened the assertion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel issue #148: Feature/faster join

Posted by kaspersorensen <gi...@git.apache.org>.
Github user kaspersorensen commented on the issue:

    https://github.com/apache/metamodel/pull/148
  
    Sorry, I can't remember your previous change :-) What I am asking for is if you can point out why this is faster. I'm maybe just not seeing it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel issue #148: Feature/faster join

Posted by kaspersorensen <gi...@git.apache.org>.
Github user kaspersorensen commented on the issue:

    https://github.com/apache/metamodel/pull/148
  
    I can't see how this is faster than the previous solution? To me it looks like a refactor of it, so it must be same speed.
    
    I mean, the refactor may have some merit, but just trying to ensure if I get it right. Is there something about this change that would make it faster?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel issue #148: Feature/faster join

Posted by tomatophantastico <gi...@git.apache.org>.
Github user tomatophantastico commented on the issue:

    https://github.com/apache/metamodel/pull/148
  
    Actually this should be slower than the previous solution as i removed the join ordering and changed the way the filter/ join condition is evaluated.
    
    Both made some assumptions i was not sure if they are right, so i try to go a safer route instead.
    
    On the other hand, this implementation plays nice with the existing testcases. The existing testcases check for string equivalence of DataSet.toString().


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel issue #148: Feature/faster join

Posted by tomatophantastico <gi...@git.apache.org>.
Github user tomatophantastico commented on the issue:

    https://github.com/apache/metamodel/pull/148
  
    Imho, would not refer to utility classs in general as an anti-pattern, you really have to look at it use-case dependent.
    
    In this case however, i agree that Metamodelhelper should be replaced.
    
     MetamodelHelper appears to do two things:
    (a)
    Execute some relation operation on the QueryPostprocessDataContext (order/group/join/...)
    (b)
    Deal with the usage of array/List/Iterable in apis
    
    I suggest for clearing things up, i move (a) first in a separate static class. From there on more profound ways for executing those operations can be defined.
    
    For dealing with (b): Some of them are not used in MM anyway, so they might be as well deleted in MM 6.x.x, may deprecate them in MM 5.x.x?
    For the other, i would really prefer to clean up the api and switch to Collection/Lists/Sets and Streams instead of arrays and iterables instead of those conversion/filtering methods.
    This is also rather big, so MM 6.x.x


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel issue #148: Feature/faster join

Posted by kaspersorensen <gi...@git.apache.org>.
Github user kaspersorensen commented on the issue:

    https://github.com/apache/metamodel/pull/148
  
    So we should close this PR then, if it's going to be slower?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel issue #148: Feature/faster join

Posted by tomatophantastico <gi...@git.apache.org>.
Github user tomatophantastico commented on the issue:

    https://github.com/apache/metamodel/pull/148
  
    Ok, this is literally meddling with the core of Metamodel.
    
    I'll refactor this pull request, such that is as minimally invasive as possible: The API doc also guarantees MetaModelHelper to be stable.
    
    For more experimental changes, i'll create an other pull request.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel issue #148: Feature/faster join

Posted by kaspersorensen <gi...@git.apache.org>.
Github user kaspersorensen commented on the issue:

    https://github.com/apache/metamodel/pull/148
  
    Ah yes you're right. That should give a good boost in performance in deed!
    
    Functionally the code looks good to me. I am not 100% sure about what I think about the JoinHelper class. It's only made up of static methods which means that all it's methods could just as well be directly part of MetaModelHelper. And MetaModelHelper is IMO already a kind of a junkdrawer of static methods, so I don't think I want another :-) On the other hand, something that would maybe be nice is if you made the JoinHelper class non-static so that you would instantiate it like this:
    ```
    DataSetJoiner joiner = new DataSetJoiner(fromItemWithJoin);
    DataSet dataSet = joiner.join(dataSets);
    ```
    (or something quite similar to this)
    
    I think then we would have a class that's quite nicely built and scoped such that it cannot be used out of context (statically). So that would be my suggestion for a more pretty design :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel issue #148: Feature/faster join

Posted by kaspersorensen <gi...@git.apache.org>.
Github user kaspersorensen commented on the issue:

    https://github.com/apache/metamodel/pull/148
  
    This PR looks very good to me. I'm gonna go ahead and format the files slightly and then merge it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request #148: Feature/faster join

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/metamodel/pull/148


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel issue #148: Feature/faster join

Posted by tomatophantastico <gi...@git.apache.org>.
Github user tomatophantastico commented on the issue:

    https://github.com/apache/metamodel/pull/148
  
    The previous method materializes the cartesian product and then filters it.
    The proposed changes applies the filters in the during the joining.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---