You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2009/09/15 14:49:57 UTC

[jira] Created: (MAHOUT-178) Rationalize 'utils' and 'common' stuff

Rationalize 'utils' and 'common' stuff
--------------------------------------

                 Key: MAHOUT-178
                 URL: https://issues.apache.org/jira/browse/MAHOUT-178
             Project: Mahout
          Issue Type: Improvement
    Affects Versions: 0.1
            Reporter: Sean Owen
            Assignee: Sean Owen
            Priority: Minor


Every project needs a common area for code that is not obviously part of any specific piece of the project, typically because it's used in many places. This is good as it promotes reuse. I would like to make an explicit effort to rationalize this project's approach to 'common', starting with some basic reshuffling, which will then pave the way to unify more of the code that is duplicated now (thinking: caches, distance measures, Hadoop integration, etc.)

Right now we have this common code in three places, when it seems like there should be basically one:
- mahout-core: org.apache.mahout.utils
- mahout-core: org.apache.mahout.common
- mahout-utils

I suggest that of the two packages named above, 'common' is slightly preferable; one could easily just merge these packages. I also would like to ask whether it makes sense to have a mahout-utils module? It's like having a mahout-core-core, in my opinion. It appears to serve exactly the same role as the other utils/common package. Would it ever be used as a standalone build product?

Renaming may sound like a trivial change, but I think the above is merely symptomatic of several developers having independent ideas about where to stash common stuff. I want to force the issue and push everyone's stuff together to begin the hard but necessary work of refactoring the code base into something more unified.


So far, I propose pushing all code together into org.apache.mahout.common. This is enough of a big-bang that will break patches that I want to propose it, and if agreed, plan when to commit.

(Also, shouldn't stuff like the distance measure classes be in a package?)

Anyway, partial patch will be attached shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-178) Rationalize 'utils' and 'common' stuff

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756541#action_12756541 ] 

Grant Ingersoll commented on MAHOUT-178:
----------------------------------------

I'm fine w/ common as a package name, although it implies to me code that is shared between modules.

> Rationalize 'utils' and 'common' stuff
> --------------------------------------
>
>                 Key: MAHOUT-178
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-178
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.1
>            Reporter: Sean Owen
>            Assignee: Sean Owen
>            Priority: Minor
>         Attachments: MAHOUT-178.patch
>
>
> Every project needs a common area for code that is not obviously part of any specific piece of the project, typically because it's used in many places. This is good as it promotes reuse. I would like to make an explicit effort to rationalize this project's approach to 'common', starting with some basic reshuffling, which will then pave the way to unify more of the code that is duplicated now (thinking: caches, distance measures, Hadoop integration, etc.)
> Right now we have this common code in three places, when it seems like there should be basically one:
> - mahout-core: org.apache.mahout.utils
> - mahout-core: org.apache.mahout.common
> - mahout-utils
> I suggest that of the two packages named above, 'common' is slightly preferable; one could easily just merge these packages. I also would like to ask whether it makes sense to have a mahout-utils module? It's like having a mahout-core-core, in my opinion. It appears to serve exactly the same role as the other utils/common package. Would it ever be used as a standalone build product?
> Renaming may sound like a trivial change, but I think the above is merely symptomatic of several developers having independent ideas about where to stash common stuff. I want to force the issue and push everyone's stuff together to begin the hard but necessary work of refactoring the code base into something more unified.
> So far, I propose pushing all code together into org.apache.mahout.common. This is enough of a big-bang that will break patches that I want to propose it, and if agreed, plan when to commit.
> (Also, shouldn't stuff like the distance measure classes be in a package?)
> Anyway, partial patch will be attached shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (MAHOUT-178) Rationalize 'utils' and 'common' stuff

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved MAHOUT-178.
------------------------------

       Resolution: Fixed
    Fix Version/s: 0.2

> Rationalize 'utils' and 'common' stuff
> --------------------------------------
>
>                 Key: MAHOUT-178
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-178
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.1
>            Reporter: Sean Owen
>            Assignee: Sean Owen
>            Priority: Minor
>             Fix For: 0.2
>
>         Attachments: MAHOUT-178.patch
>
>
> Every project needs a common area for code that is not obviously part of any specific piece of the project, typically because it's used in many places. This is good as it promotes reuse. I would like to make an explicit effort to rationalize this project's approach to 'common', starting with some basic reshuffling, which will then pave the way to unify more of the code that is duplicated now (thinking: caches, distance measures, Hadoop integration, etc.)
> Right now we have this common code in three places, when it seems like there should be basically one:
> - mahout-core: org.apache.mahout.utils
> - mahout-core: org.apache.mahout.common
> - mahout-utils
> I suggest that of the two packages named above, 'common' is slightly preferable; one could easily just merge these packages. I also would like to ask whether it makes sense to have a mahout-utils module? It's like having a mahout-core-core, in my opinion. It appears to serve exactly the same role as the other utils/common package. Would it ever be used as a standalone build product?
> Renaming may sound like a trivial change, but I think the above is merely symptomatic of several developers having independent ideas about where to stash common stuff. I want to force the issue and push everyone's stuff together to begin the hard but necessary work of refactoring the code base into something more unified.
> So far, I propose pushing all code together into org.apache.mahout.common. This is enough of a big-bang that will break patches that I want to propose it, and if agreed, plan when to commit.
> (Also, shouldn't stuff like the distance measure classes be in a package?)
> Anyway, partial patch will be attached shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-178) Rationalize 'utils' and 'common' stuff

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756556#action_12756556 ] 

Sean Owen commented on MAHOUT-178:
----------------------------------

I see, if you are saying mahout-utils really has this identity and is not just another word for common code, then it should remain as is. I will revert that part. And I think it can/should stay in a 'utils' package. As long as everyone shares that rough understanding and organizes code accordingly, cool.

We have -examples, -utils, and now I am proposing -sandbox. I think they all have coherent identities then, just making sure people think that makes sense.

I agree that 'common' is (only) for stuff shared by modules, and belongs in -core.

> Rationalize 'utils' and 'common' stuff
> --------------------------------------
>
>                 Key: MAHOUT-178
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-178
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.1
>            Reporter: Sean Owen
>            Assignee: Sean Owen
>            Priority: Minor
>         Attachments: MAHOUT-178.patch
>
>
> Every project needs a common area for code that is not obviously part of any specific piece of the project, typically because it's used in many places. This is good as it promotes reuse. I would like to make an explicit effort to rationalize this project's approach to 'common', starting with some basic reshuffling, which will then pave the way to unify more of the code that is duplicated now (thinking: caches, distance measures, Hadoop integration, etc.)
> Right now we have this common code in three places, when it seems like there should be basically one:
> - mahout-core: org.apache.mahout.utils
> - mahout-core: org.apache.mahout.common
> - mahout-utils
> I suggest that of the two packages named above, 'common' is slightly preferable; one could easily just merge these packages. I also would like to ask whether it makes sense to have a mahout-utils module? It's like having a mahout-core-core, in my opinion. It appears to serve exactly the same role as the other utils/common package. Would it ever be used as a standalone build product?
> Renaming may sound like a trivial change, but I think the above is merely symptomatic of several developers having independent ideas about where to stash common stuff. I want to force the issue and push everyone's stuff together to begin the hard but necessary work of refactoring the code base into something more unified.
> So far, I propose pushing all code together into org.apache.mahout.common. This is enough of a big-bang that will break patches that I want to propose it, and if agreed, plan when to commit.
> (Also, shouldn't stuff like the distance measure classes be in a package?)
> Anyway, partial patch will be attached shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-178) Rationalize 'utils' and 'common' stuff

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756539#action_12756539 ] 

Grant Ingersoll commented on MAHOUT-178:
----------------------------------------

The mahout-utils module to me is where we can put tools that help get things ready for Mahout.  It can bring in libraries like Lucene, Tika, etc. to prepare raw content for use by Mahout and also to provide utilities that might be helpful in dealing with output.  I don't think it is core because not everyone will need it and it helps keep the core more focused on providing algorithm implementations.

> Rationalize 'utils' and 'common' stuff
> --------------------------------------
>
>                 Key: MAHOUT-178
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-178
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.1
>            Reporter: Sean Owen
>            Assignee: Sean Owen
>            Priority: Minor
>         Attachments: MAHOUT-178.patch
>
>
> Every project needs a common area for code that is not obviously part of any specific piece of the project, typically because it's used in many places. This is good as it promotes reuse. I would like to make an explicit effort to rationalize this project's approach to 'common', starting with some basic reshuffling, which will then pave the way to unify more of the code that is duplicated now (thinking: caches, distance measures, Hadoop integration, etc.)
> Right now we have this common code in three places, when it seems like there should be basically one:
> - mahout-core: org.apache.mahout.utils
> - mahout-core: org.apache.mahout.common
> - mahout-utils
> I suggest that of the two packages named above, 'common' is slightly preferable; one could easily just merge these packages. I also would like to ask whether it makes sense to have a mahout-utils module? It's like having a mahout-core-core, in my opinion. It appears to serve exactly the same role as the other utils/common package. Would it ever be used as a standalone build product?
> Renaming may sound like a trivial change, but I think the above is merely symptomatic of several developers having independent ideas about where to stash common stuff. I want to force the issue and push everyone's stuff together to begin the hard but necessary work of refactoring the code base into something more unified.
> So far, I propose pushing all code together into org.apache.mahout.common. This is enough of a big-bang that will break patches that I want to propose it, and if agreed, plan when to commit.
> (Also, shouldn't stuff like the distance measure classes be in a package?)
> Anyway, partial patch will be attached shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-178) Rationalize 'utils' and 'common' stuff

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated MAHOUT-178:
-----------------------------

    Attachment: MAHOUT-178.patch

I would like to commit this patch shortly. It's big but mostly just moving files and changing imports. It consolidates to the 'common' package, and should hold most of the patch breakage this will cause.

Next I'd like to move current mahout-utils code to mahout-common (same package) and convert that module to mahout-sandbox

> Rationalize 'utils' and 'common' stuff
> --------------------------------------
>
>                 Key: MAHOUT-178
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-178
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.1
>            Reporter: Sean Owen
>            Assignee: Sean Owen
>            Priority: Minor
>         Attachments: MAHOUT-178.patch
>
>
> Every project needs a common area for code that is not obviously part of any specific piece of the project, typically because it's used in many places. This is good as it promotes reuse. I would like to make an explicit effort to rationalize this project's approach to 'common', starting with some basic reshuffling, which will then pave the way to unify more of the code that is duplicated now (thinking: caches, distance measures, Hadoop integration, etc.)
> Right now we have this common code in three places, when it seems like there should be basically one:
> - mahout-core: org.apache.mahout.utils
> - mahout-core: org.apache.mahout.common
> - mahout-utils
> I suggest that of the two packages named above, 'common' is slightly preferable; one could easily just merge these packages. I also would like to ask whether it makes sense to have a mahout-utils module? It's like having a mahout-core-core, in my opinion. It appears to serve exactly the same role as the other utils/common package. Would it ever be used as a standalone build product?
> Renaming may sound like a trivial change, but I think the above is merely symptomatic of several developers having independent ideas about where to stash common stuff. I want to force the issue and push everyone's stuff together to begin the hard but necessary work of refactoring the code base into something more unified.
> So far, I propose pushing all code together into org.apache.mahout.common. This is enough of a big-bang that will break patches that I want to propose it, and if agreed, plan when to commit.
> (Also, shouldn't stuff like the distance measure classes be in a package?)
> Anyway, partial patch will be attached shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.