You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@metamodel.apache.org by kaspersorensen <gi...@git.apache.org> on 2016/04/23 00:54:45 UTC

[GitHub] metamodel pull request: (draft) METAMODEL-244

GitHub user kaspersorensen opened a pull request:

    https://github.com/apache/metamodel/pull/96

    (draft) METAMODEL-244

    I wanted to give a shot at fixing METAMODEL-244 so here goes.
    
    I made available a strategy interface for column naming, which I implemented in a few different ways, and created a default delegating strategy as well. It's implemented in the 'fixedwidth' module only for now.
    
    I also managed to fix something which I've encountered a few times before, but newer got around to report as an issue: The fact that it is annoying that you sometimes get duplicated column names for CSV files and such. That should rarely be allowed, so I implemented a strategy for renaming those duplicated column names automatically.
    
    I'd like a review if anyone is willing. If you agree that the approach taken is good I will happily also apply it in the excel and csv module. It would be quite easy I think actually.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kaspersorensen/metamodel METAMODEL-244

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/metamodel/pull/96.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #96
    
----
commit 3d0666cabce5e6df40ec341bb1d21b74343d1e3f
Author: kaspersorensen <i....@gmail.com>
Date:   2016-04-22T21:54:22Z

    METAMODEL-244: Made ColumnNamingStrategy available to fixedwidth module

commit 0da8f1a1cb9fa8bfe87eb73f38df047d111e196d
Author: kaspersorensen <i....@gmail.com>
Date:   2016-04-22T22:50:06Z

    METAMODEL-244: Refactored fixedwidth module a bit and added strategies

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request: METAMODEL-244

Posted by kaspersorensen <gi...@git.apache.org>.
Github user kaspersorensen commented on a diff in the pull request:

    https://github.com/apache/metamodel/pull/96#discussion_r61097276
  
    --- Diff: core/src/main/java/org/apache/metamodel/schema/naming/DefaultColumnNamingStrategy.java ---
    @@ -0,0 +1,35 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +package org.apache.metamodel.schema.naming;
    +
    +import org.apache.metamodel.schema.builder.DelegatingIntrinsicSwitchColumnNamingStrategy;
    +
    +/**
    + * The default (in most cases) {@link ColumnNamingStrategy} to use when no other
    + * strategy is specified.
    + */
    +public class DefaultColumnNamingStrategy extends DelegatingIntrinsicSwitchColumnNamingStrategy {
    --- End diff --
    
    Ok I will make a ColumnNamingStrategies class or such instead 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request: METAMODEL-244

Posted by albertostratio <gi...@git.apache.org>.
Github user albertostratio commented on the pull request:

    https://github.com/apache/metamodel/pull/96#issuecomment-214657798
  
    @kaspersorensen Do we need all the ColumnNamingStrategy interfaces and implementations stuff to just get the names of the columns. It seems like kinda over-engineering to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request: METAMODEL-244

Posted by kaspersorensen <gi...@git.apache.org>.
Github user kaspersorensen commented on the pull request:

    https://github.com/apache/metamodel/pull/96#issuecomment-214863231
  
    Renamed it. Will merge it now then.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request: METAMODEL-244

Posted by LosD <gi...@git.apache.org>.
Github user LosD commented on the pull request:

    https://github.com/apache/metamodel/pull/96#issuecomment-214724511
  
    Code mostly looks good. But it is quite a bit more complicated than I expected.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request: METAMODEL-244

Posted by LosD <gi...@git.apache.org>.
Github user LosD commented on a diff in the pull request:

    https://github.com/apache/metamodel/pull/96#discussion_r61075904
  
    --- Diff: core/src/main/java/org/apache/metamodel/schema/naming/UniqueColumnNamingStrategy.java ---
    @@ -0,0 +1,62 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +package org.apache.metamodel.schema.naming;
    +
    +import java.util.HashSet;
    +import java.util.Set;
    +
    +/**
    + * A {@link ColumnNamingStrategy} that uses the intrinsic column names, but
    + * ensures that all column names are unique. When duplicate names are
    + * encountered a number will be appended yielding column names like "name",
    + * "name1", "name2" etc.
    + */
    +public class UniqueColumnNamingStrategy implements ColumnNamingStrategy {
    +
    +    private static final long serialVersionUID = 1L;
    +
    +    @Override
    +    public ColumnNamingSession startColumnNamingSession() {
    +        return new ColumnNamingSession() {
    +
    +            private final Set<String> names = new HashSet<>();
    +
    +            @Override
    +            public String getNextColumnName(ColumnNamingContext ctx) {
    +                final String intrinsicName = ctx.getIntrinsicColumnName();
    +                boolean unique = names.add(intrinsicName);
    +                if (unique) {
    +                    return intrinsicName;
    +                }
    +
    +                String newName = null;
    +                for (int i = 2; !unique; i++) {
    +                    newName = intrinsicName + i;
    --- End diff --
    
    Wouldn't an underscore before be better? Some columns may already end with numbers.
    
    That could lead to endless confusion, e.g. something like "is_iso9001" would become "is_iso90011", which is a different standard (no, I have no good reason why this exact example would ever happen, but it's easily a general problem :))


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request: METAMODEL-244

Posted by kaspersorensen <gi...@git.apache.org>.
Github user kaspersorensen commented on the pull request:

    https://github.com/apache/metamodel/pull/96#issuecomment-214036487
  
    Also implemented this for CSV module now. And added a 'session' abstraction to encapsulate the state that is needed while applying column names for a table.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request: METAMODEL-244

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/metamodel/pull/96


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request: METAMODEL-244

Posted by LosD <gi...@git.apache.org>.
Github user LosD commented on the pull request:

    https://github.com/apache/metamodel/pull/96#issuecomment-214864543
  
    Yep. A bit "dangerous" to still get it with `ColumnNamingStrategies.defaultStrategy()`, but nevermind.  :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request: METAMODEL-244

Posted by LosD <gi...@git.apache.org>.
Github user LosD commented on the pull request:

    https://github.com/apache/metamodel/pull/96#issuecomment-214856085
  
    Looks good! Maybe I'd let the test be DelegatingIntrinsicSwitchColumnNamingStrategyTest. That it is the default is maybe less relevant for the test than what it really is testing. But that's up to you, not a deal breaker for me.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request: METAMODEL-244

Posted by kaspersorensen <gi...@git.apache.org>.
Github user kaspersorensen commented on the pull request:

    https://github.com/apache/metamodel/pull/96#issuecomment-214852899
  
    Changed as per review remarks. What do you think now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request: (draft) METAMODEL-244

Posted by kaspersorensen <gi...@git.apache.org>.
Github user kaspersorensen commented on a diff in the pull request:

    https://github.com/apache/metamodel/pull/96#discussion_r60834460
  
    --- Diff: core/src/main/java/org/apache/metamodel/schema/builder/ColumnNamingStrategy.java ---
    @@ -0,0 +1,37 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +package org.apache.metamodel.schema.builder;
    +
    +/**
    + * A strategy that defines how columns are logically named. Such strategies are
    + * mostly used when a particular datastore is not itself intrinsically
    + * specifying the column name.
    + */
    +public interface ColumnNamingStrategy {
    +
    +    /**
    +     * Provides the name to apply for a given column.
    +     * 
    +     * @param ctx
    +     *            the context of the column naming taking place. This contains
    +     *            column index, intrinsic name etc. if available.
    +     * @return the name to provide to the column.
    +     */
    +    public String getNextColumnName(ColumnNamingContext ctx);
    --- End diff --
    
    I'm wondering if this method should reside here or in some object that is scoped just for the individual table. Because right now the interface seems to be global fir many tables but implementations typically need state which is scoped only for the duration of naming a single table.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request: METAMODEL-244

Posted by kaspersorensen <gi...@git.apache.org>.
Github user kaspersorensen commented on a diff in the pull request:

    https://github.com/apache/metamodel/pull/96#discussion_r61096874
  
    --- Diff: core/src/main/java/org/apache/metamodel/schema/naming/UniqueColumnNamingStrategy.java ---
    @@ -0,0 +1,62 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +package org.apache.metamodel.schema.naming;
    +
    +import java.util.HashSet;
    +import java.util.Set;
    +
    +/**
    + * A {@link ColumnNamingStrategy} that uses the intrinsic column names, but
    + * ensures that all column names are unique. When duplicate names are
    + * encountered a number will be appended yielding column names like "name",
    + * "name1", "name2" etc.
    + */
    +public class UniqueColumnNamingStrategy implements ColumnNamingStrategy {
    +
    +    private static final long serialVersionUID = 1L;
    +
    +    @Override
    +    public ColumnNamingSession startColumnNamingSession() {
    +        return new ColumnNamingSession() {
    +
    +            private final Set<String> names = new HashSet<>();
    +
    +            @Override
    +            public String getNextColumnName(ColumnNamingContext ctx) {
    +                final String intrinsicName = ctx.getIntrinsicColumnName();
    +                boolean unique = names.add(intrinsicName);
    +                if (unique) {
    +                    return intrinsicName;
    +                }
    +
    +                String newName = null;
    +                for (int i = 2; !unique; i++) {
    +                    newName = intrinsicName + i;
    --- End diff --
    
    Agreed, will add an underscore. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request: METAMODEL-244

Posted by LosD <gi...@git.apache.org>.
Github user LosD commented on a diff in the pull request:

    https://github.com/apache/metamodel/pull/96#discussion_r61074112
  
    --- Diff: core/src/main/java/org/apache/metamodel/schema/naming/DefaultColumnNamingStrategy.java ---
    @@ -0,0 +1,35 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +package org.apache.metamodel.schema.naming;
    +
    +import org.apache.metamodel.schema.builder.DelegatingIntrinsicSwitchColumnNamingStrategy;
    +
    +/**
    + * The default (in most cases) {@link ColumnNamingStrategy} to use when no other
    + * strategy is specified.
    + */
    +public class DefaultColumnNamingStrategy extends DelegatingIntrinsicSwitchColumnNamingStrategy {
    --- End diff --
    
    (repeat of #97 comment, just in the right place :))
    That's a pretty weird way to make something a default to me.
    
    This screams for some sort of proper factory.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---