You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2009/10/02 01:15:25 UTC

[jira] Created: (MAHOUT-184) Code tweaks for .df.* code

Code tweaks for .df.* code
--------------------------

                 Key: MAHOUT-184
                 URL: https://issues.apache.org/jira/browse/MAHOUT-184
             Project: Mahout
          Issue Type: Improvement
            Reporter: Sean Owen
            Assignee: Sean Owen
            Priority: Minor
             Fix For: 0.2


This follows on my last email to the mailing list, and code inspection. It's big enough I made a patch. No surprises I hope given the consensus on code style and practice. Might be some good takeaways in here, or points for further discussion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Commented: (MAHOUT-184) Code tweaks for .df.* code

Posted by deneche abdelhakim <ad...@gmail.com>.
Sure.

On Fri, Oct 2, 2009 at 8:59 AM, Isabel Drost (JIRA) <ji...@apache.org> wrote:
>
>    [ https://issues.apache.org/jira/browse/MAHOUT-184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761501#action_12761501 ]
>
> Isabel Drost commented on MAHOUT-184:
> -------------------------------------
>
> Looks good to me. Deneche, could you please also have a look at the patch to spot any issues early on?
>
> I would prefer using CLI for the job implementation (core/src/main/java/org/apache/mahout/cf/taste/hadoop/RecommenderJob.java), but that can be done in a later patch.
>
>
>
>> Code tweaks for .df.* code
>> --------------------------
>>
>>                 Key: MAHOUT-184
>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-184
>>             Project: Mahout
>>          Issue Type: Improvement
>>            Reporter: Sean Owen
>>            Assignee: Sean Owen
>>            Priority: Minor
>>             Fix For: 0.2
>>
>>         Attachments: Tweaks_to__df__.patch
>>
>>
>> This follows on my last email to the mailing list, and code inspection. It's big enough I made a patch. No surprises I hope given the consensus on code style and practice. Might be some good takeaways in here, or points for further discussion.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

[jira] Updated: (MAHOUT-184) Code tweaks for .df.* code

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated MAHOUT-184:
-----------------------------

    Attachment: MAHOUT-184.patch

More stuff I'd like to change, on a related note -- in particular here I'd like to not throw Exception (except from main() or test methods), nor RuntimeException, and instead use the most appropriate standard subclass.

> Code tweaks for .df.* code
> --------------------------
>
>                 Key: MAHOUT-184
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-184
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: Sean Owen
>            Assignee: Sean Owen
>            Priority: Minor
>             Fix For: 0.2
>
>         Attachments: MAHOUT-184.patch, Tweaks_to__df__.patch
>
>
> This follows on my last email to the mailing list, and code inspection. It's big enough I made a patch. No surprises I hope given the consensus on code style and practice. Might be some good takeaways in here, or points for further discussion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-184) Code tweaks for .df.* code

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated MAHOUT-184:
-----------------------------

    Attachment: Tweaks_to__df__.patch

> Code tweaks for .df.* code
> --------------------------
>
>                 Key: MAHOUT-184
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-184
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: Sean Owen
>            Assignee: Sean Owen
>            Priority: Minor
>             Fix For: 0.2
>
>         Attachments: Tweaks_to__df__.patch
>
>
> This follows on my last email to the mailing list, and code inspection. It's big enough I made a patch. No surprises I hope given the consensus on code style and practice. Might be some good takeaways in here, or points for further discussion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-184) Code tweaks for .df.* code

Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761819#action_12761819 ] 

Deneche A. Hakim commented on MAHOUT-184:
-----------------------------------------

I'm having trouble applying the patch, got a lot of: "Hunk #1 FAILED at XX."
I inspected the patch without applying it and it look fine, although I have two comments:

* comment 1:
in core/main/.../df/node/Node.java you added the following comment:
{code}
   private int node2Type() {
+    // TODO shouldn't subclasses override this method then??
{/code}

the problem is that node2type() is used when writing a Node to actually store its class in the form of an Integer. The static method Node.read(DataInput in) is used to read the node delegating the reading to the correct implementation. Overriding node2Type() without modifying Node.read(DataInput in) is a certain way to have headaches !!!
A better solution is to store the class name and use class loaders, but for now I don't really need it.

* comment 2:
in core/src/test/java/org/apache/mahout/df/data/DataLoaderTest.java you made the following change:

{code}
+  protected static void checkCategorical(double[][] source, List<Integer> missings,
       Data loaded, int attr, int aId, double oValue, double nValue) {
     int lind = 0;
 
@@ -286,7 +277,7 @@
         continue;
 
       if (source[index][attr] == oValue) {
-        assertTrue(nValue == loaded.get(lind).get(aId));
+        assertEquals(nValue, loaded.get(lind).get(aId), 0.0);
{/code}

I think you meant :

{code}
+        assertEquals(nValue, loaded.get(lind).get(aId)); 
{/code}


> Code tweaks for .df.* code
> --------------------------
>
>                 Key: MAHOUT-184
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-184
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: Sean Owen
>            Assignee: Sean Owen
>            Priority: Minor
>             Fix For: 0.2
>
>         Attachments: Tweaks_to__df__.patch
>
>
> This follows on my last email to the mailing list, and code inspection. It's big enough I made a patch. No surprises I hope given the consensus on code style and practice. Might be some good takeaways in here, or points for further discussion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (MAHOUT-184) Code tweaks for .df.* code

Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761819#action_12761819 ] 

Deneche A. Hakim edited comment on MAHOUT-184 at 10/2/09 11:32 PM:
-------------------------------------------------------------------

I'm having trouble applying the patch, got a lot of: "Hunk #1 FAILED at XX."
I inspected the patch without applying it and it look fine, although I have two comments:

* comment 1:
in core/main/.../df/node/Node.java you added the following comment:
{code}
   private int node2Type() {
+    // TODO shouldn't subclasses override this method then??
{code}

the problem is that node2type() is used when writing a Node to actually store its class in the form of an Integer. The static method Node.read(DataInput in) is used to read the node delegating the reading to the correct implementation. Overriding node2Type() without modifying Node.read(DataInput in) is a certain way to have headaches !!!
A better solution is to store the class name and use class loaders, but for now I don't really need it.

* comment 2:
in core/src/test/java/org/apache/mahout/df/data/DataLoaderTest.java you made the following change:

{code}
+  protected static void checkCategorical(double[][] source, List<Integer> missings,
       Data loaded, int attr, int aId, double oValue, double nValue) {
     int lind = 0;
 
@@ -286,7 +277,7 @@
         continue;
 
       if (source[index][attr] == oValue) {
-        assertTrue(nValue == loaded.get(lind).get(aId));
+        assertEquals(nValue, loaded.get(lind).get(aId), 0.0);
{code}

I think you meant :

{code}
+        assertEquals(nValue, loaded.get(lind).get(aId)); 
{code}


      was (Author: adeneche):
    I'm having trouble applying the patch, got a lot of: "Hunk #1 FAILED at XX."
I inspected the patch without applying it and it look fine, although I have two comments:

* comment 1:
in core/main/.../df/node/Node.java you added the following comment:
{code}
   private int node2Type() {
+    // TODO shouldn't subclasses override this method then??
{/code}

the problem is that node2type() is used when writing a Node to actually store its class in the form of an Integer. The static method Node.read(DataInput in) is used to read the node delegating the reading to the correct implementation. Overriding node2Type() without modifying Node.read(DataInput in) is a certain way to have headaches !!!
A better solution is to store the class name and use class loaders, but for now I don't really need it.

* comment 2:
in core/src/test/java/org/apache/mahout/df/data/DataLoaderTest.java you made the following change:

{code}
+  protected static void checkCategorical(double[][] source, List<Integer> missings,
       Data loaded, int attr, int aId, double oValue, double nValue) {
     int lind = 0;
 
@@ -286,7 +277,7 @@
         continue;
 
       if (source[index][attr] == oValue) {
-        assertTrue(nValue == loaded.get(lind).get(aId));
+        assertEquals(nValue, loaded.get(lind).get(aId), 0.0);
{/code}

I think you meant :

{code}
+        assertEquals(nValue, loaded.get(lind).get(aId)); 
{/code}

  
> Code tweaks for .df.* code
> --------------------------
>
>                 Key: MAHOUT-184
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-184
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: Sean Owen
>            Assignee: Sean Owen
>            Priority: Minor
>             Fix For: 0.2
>
>         Attachments: Tweaks_to__df__.patch
>
>
> This follows on my last email to the mailing list, and code inspection. It's big enough I made a patch. No surprises I hope given the consensus on code style and practice. Might be some good takeaways in here, or points for further discussion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-184) Code tweaks for .df.* code

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761854#action_12761854 ] 

Sean Owen commented on MAHOUT-184:
----------------------------------


True, either way these two methods must be updated together. write()
and readFields() must be updated in parallel in the subclasses or else
things break too. It feels more object-oriented to override the method
in subclasses (purists might suggest that the Node enum itself should
have a method that returns a corresponding instance of an
implementation) but it is not something I personally feel so strongly
about.


It's the same, the last arg defaults to 0.0 anyway. Assuming they
really should be the exact same double value and can't vary due to
intermediate rounding, this is right.


> Code tweaks for .df.* code
> --------------------------
>
>                 Key: MAHOUT-184
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-184
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: Sean Owen
>            Assignee: Sean Owen
>            Priority: Minor
>             Fix For: 0.2
>
>         Attachments: Tweaks_to__df__.patch
>
>
> This follows on my last email to the mailing list, and code inspection. It's big enough I made a patch. No surprises I hope given the consensus on code style and practice. Might be some good takeaways in here, or points for further discussion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (MAHOUT-184) Code tweaks for .df.* code

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved MAHOUT-184.
------------------------------

    Resolution: Fixed

> Code tweaks for .df.* code
> --------------------------
>
>                 Key: MAHOUT-184
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-184
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: Sean Owen
>            Assignee: Sean Owen
>            Priority: Minor
>             Fix For: 0.2
>
>         Attachments: MAHOUT-184.patch, Tweaks_to__df__.patch
>
>
> This follows on my last email to the mailing list, and code inspection. It's big enough I made a patch. No surprises I hope given the consensus on code style and practice. Might be some good takeaways in here, or points for further discussion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-184) Code tweaks for .df.* code

Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762089#action_12762089 ] 

Deneche A. Hakim commented on MAHOUT-184:
-----------------------------------------

.df* and .ga.* changes looks good to me

but I'm still enable to apply the patch, I'm getting a lot of "Hunk...failed..." !

> Code tweaks for .df.* code
> --------------------------
>
>                 Key: MAHOUT-184
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-184
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: Sean Owen
>            Assignee: Sean Owen
>            Priority: Minor
>             Fix For: 0.2
>
>         Attachments: MAHOUT-184.patch, Tweaks_to__df__.patch
>
>
> This follows on my last email to the mailing list, and code inspection. It's big enough I made a patch. No surprises I hope given the consensus on code style and practice. Might be some good takeaways in here, or points for further discussion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-184) Code tweaks for .df.* code

Posted by "Isabel Drost (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761501#action_12761501 ] 

Isabel Drost commented on MAHOUT-184:
-------------------------------------

Looks good to me. Deneche, could you please also have a look at the patch to spot any issues early on?

I would prefer using CLI for the job implementation (core/src/main/java/org/apache/mahout/cf/taste/hadoop/RecommenderJob.java), but that can be done in a later patch.



> Code tweaks for .df.* code
> --------------------------
>
>                 Key: MAHOUT-184
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-184
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: Sean Owen
>            Assignee: Sean Owen
>            Priority: Minor
>             Fix For: 0.2
>
>         Attachments: Tweaks_to__df__.patch
>
>
> This follows on my last email to the mailing list, and code inspection. It's big enough I made a patch. No surprises I hope given the consensus on code style and practice. Might be some good takeaways in here, or points for further discussion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.