You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@metamodel.apache.org by kaspersorensen <gi...@git.apache.org> on 2017/10/01 04:19:20 UTC

[GitHub] metamodel pull request #163: METAMODEL-1165: Fixed queries w/o table names (...

GitHub user kaspersorensen opened a pull request:

    https://github.com/apache/metamodel/pull/163

    METAMODEL-1165: Fixed queries w/o table names (syntax: "FROM tables[0]")

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kaspersorensen/metamodel METAMODEL-1165-table-index-querying

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/metamodel/pull/163.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #163
    
----
commit 43f1be1c29ab9bedb9a055204ef6284252d897ea
Author: Kasper Sørensen <i....@gmail.com>
Date:   2017-10-01T04:18:29Z

    METAMODEL-1165: Fixed queries w/o table names (syntax: "FROM tables[0]")

----


---

[GitHub] metamodel issue #163: METAMODEL-1165: Fixed queries w/o table names (syntax:...

Posted by kaspersorensen <gi...@git.apache.org>.
Github user kaspersorensen commented on the issue:

    https://github.com/apache/metamodel/pull/163
  
    I'm not opposed to _also_ adding better support for choosing your own table names and such, but yes, the idea behind using the filename as table name was to allow expansion of the logical DataContext either via CompositeDataContext or maybe by a future implementation where a folder of files could be opened as one schema.
    
    When you mention "alias in the data context", you mean another Table with TableType=ALIAS? I can see how that would make it really clear. OTOH that would then have to be implemented on a data context basis rather than as a generic construct in the parser. If I can draw a parallel to `SELECT *` then I would say that this is a parsing/query level convenience that I find really useful whenever I'm feeling lazy and don't want to figure out column names. I am kind of hoping for something similar just for table names.


---

[GitHub] metamodel issue #163: METAMODEL-1165: Fixed queries w/o table names (syntax:...

Posted by tomatophantastico <gi...@git.apache.org>.
Github user tomatophantastico commented on the issue:

    https://github.com/apache/metamodel/pull/163
  
    To me, there are two use cases / requirements here:
    
    1. Make queries transferable
    2. Ease of exploring a schema and its data via SQL
    
    w.r.t 1)
    Yes, i think the data contexts are the place where this should be implemented. As most (all?) non-jdbc datacontexts are extending QueryPostProcessingContext, this could perhaps be done without too many changes.
    
    i generally think it is not a good idea to deal with a table "Backup_23423_hash_by_Jörg_do_not_modify.csv" in the query, especially if it is an export of the lets say person table. Having an alias would clear up the semantics of the query.
    
    "select * from person" will work on the source system, "select * from tables[0]" not. 
    
    
    w.r.t. 2)
    Just iterating over tables without having the table name, does not really make sense to me. How should the result of such a query look like?
    
    Most DBMs implement this via custom commands (list tables/documents...) or using tables (SELECT table_schema,table_name FROM information_schema.tables).
    
    Maybe we can cook up sth. more convenient here. But i would prefer to change the semantics of SQL.
    
    
    
    



---

[GitHub] metamodel issue #163: METAMODEL-1165: Fixed queries w/o table names (syntax:...

Posted by LosD <gi...@git.apache.org>.
Github user LosD commented on the issue:

    https://github.com/apache/metamodel/pull/163
  
    Every list IS ordered. That's part of the contract of the List  interface... 


---

[GitHub] metamodel pull request #163: METAMODEL-1165: Fixed queries w/o table names (...

Posted by kaspersorensen <gi...@git.apache.org>.
Github user kaspersorensen closed the pull request at:

    https://github.com/apache/metamodel/pull/163


---

[GitHub] metamodel issue #163: METAMODEL-1165: Fixed queries w/o table names (syntax:...

Posted by LosD <gi...@git.apache.org>.
Github user LosD commented on the issue:

    https://github.com/apache/metamodel/pull/163
  
    On the way home now, will take a proper look when I'm home. 


---

[GitHub] metamodel issue #163: METAMODEL-1165: Fixed queries w/o table names (syntax:...

Posted by kaspersorensen <gi...@git.apache.org>.
Github user kaspersorensen commented on the issue:

    https://github.com/apache/metamodel/pull/163
  
    Before I go ahead and merge this based on Lazy Concensus, I'd like to ask one more time if anyone has any feedback on the new feature. To me it seems pretty cool, but others may feel otherwise.
    
    @LosD maybe a comment? Or @ardlema?


---

[GitHub] metamodel issue #163: METAMODEL-1165: Fixed queries w/o table names (syntax:...

Posted by kaspersorensen <gi...@git.apache.org>.
Github user kaspersorensen commented on the issue:

    https://github.com/apache/metamodel/pull/163
  
    I don't think that the right way forward is to let it be "up to me" or anyone else for that matter. We should find some kind of concensus or agreement, at least between the people who engage in MetaModel's direction. So from that angle, I think we should either discuss more or drop the feature :-) Just to confirm then - you would be more in favor of introducing a keyword specifically for single-table datastores?


---

[GitHub] metamodel issue #163: METAMODEL-1165: Fixed queries w/o table names (syntax:...

Posted by kaspersorensen <gi...@git.apache.org>.
Github user kaspersorensen commented on the issue:

    https://github.com/apache/metamodel/pull/163
  
    OK I guess I can get behind using an ALIAS table too. My only worries are around (1) user interfaces such as DataCleaner's which may now become cluttered with additional tables and (2) composite datacontexts built from e.g. 2 csv files, which would then have two ALIAS tables of same name. But both concerns can be fixed, I think, by filtering out irrelevant table types.


---

[GitHub] metamodel issue #163: METAMODEL-1165: Fixed queries w/o table names (syntax:...

Posted by tomatophantastico <gi...@git.apache.org>.
Github user tomatophantastico commented on the issue:

    https://github.com/apache/metamodel/pull/163
  
    IMHO: i prefer the query language to be free of "special constructs" as much as possible. 
    
    I encoutered a similar problem as described in mm-1165 and my workaround was to construct the query programatically.
    But i was not able to externalize the SQL string with CSVs, as the schema depends on the serialisation of the data, e.g. the table name depends on the file name.
    
    To break this dependency and allow reuse of SQL, a better idea would be to break the tablename/filename connection and allow either 
    - a custom table name on constructing single table data contexts
    - a better default value on constructing single table data contexts
    -  an alias in the data context.
    
    What is the reason using the filename for tables? To avoid clashes in the composite DataContext?


---

[GitHub] metamodel issue #163: METAMODEL-1165: Fixed queries w/o table names (syntax:...

Posted by LosD <gi...@git.apache.org>.
Github user LosD commented on the issue:

    https://github.com/apache/metamodel/pull/163
  
    Something like that sounds like a better idea to me, yes. 
    
    Suggestions:
    1. Keep the format, but reject it for all multi-table Datastores. Might of course be a little weird, interface-wise (an array selector that only works when there's no array :slightly_smiling_face:).
    2. If the parser can handle it, just allow single-table data to skip the FROM. A bit odd-looking to me though.
    3. Allow something like "default_table" as well as the actual table name for single-table datastores.
    4. Require the FROM-clause, but ignore the table name for single-table datastores.


---

[GitHub] metamodel issue #163: METAMODEL-1165: Fixed queries w/o table names (syntax:...

Posted by LosD <gi...@git.apache.org>.
Github user LosD commented on the issue:

    https://github.com/apache/metamodel/pull/163
  
    Anyway, up to you. I still think it's a horrible idea, a small convenience is no worth it for a major source of bugs in client code... Even bugs that will show itself at unexpected times (again, try to run our Integration tests. They're mostly broken due to having silly assumptions that this would encourage more of, and DB upgrades reordering tables and schemas).


---

[GitHub] metamodel issue #163: METAMODEL-1165: Fixed queries w/o table names (syntax:...

Posted by LosD <gi...@git.apache.org>.
Github user LosD commented on the issue:

    https://github.com/apache/metamodel/pull/163
  
    Took a look (phone code-browsing was not as bad as I thought it would be) . Code looks fine, but the whole idea is pretty terrible. We have plenty of problems with datastores being vulnerable to order change (just look at our integration tests where half the issues are due to order being assumed), and this will make it worse.
    
    As I read from the issue, its MEANT for single table datastores, but that's not the implementation. This _will_ be used for more and it_will_ break when they update their databases.
    
    I'd either drop it completely or find a format that can only be used for single-table datastores, not even for multi table datastores that currently contains a single table. 


---

[GitHub] metamodel issue #163: METAMODEL-1165: Fixed queries w/o table names (syntax:...

Posted by kaspersorensen <gi...@git.apache.org>.
Github user kaspersorensen commented on the issue:

    https://github.com/apache/metamodel/pull/163
  
    If I was to paraphrase then you're saying that just because not every `List` is ordered, you shouldn't have a `get(int)` method? I don't think that's reasonable :-) I agree that it might not be what you want to use in many (most) situations. But if you know that you're looking for "table no. X" then it adds a small convenience. Or maybe you know the amount of tables, but not their names, and you want to fire a query to select the count of all tables?


---