You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2009/10/02 01:15:25 UTC
[jira] Created: (MAHOUT-184) Code tweaks for .df.* code
Code tweaks for .df.* code
--------------------------
Key: MAHOUT-184
URL: https://issues.apache.org/jira/browse/MAHOUT-184
Project: Mahout
Issue Type: Improvement
Reporter: Sean Owen
Assignee: Sean Owen
Priority: Minor
Fix For: 0.2
This follows on my last email to the mailing list, and code inspection. It's big enough I made a patch. No surprises I hope given the consensus on code style and practice. Might be some good takeaways in here, or points for further discussion.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
Re: [jira] Commented: (MAHOUT-184) Code tweaks for .df.* code
Posted by deneche abdelhakim <ad...@gmail.com>.
Sure.
On Fri, Oct 2, 2009 at 8:59 AM, Isabel Drost (JIRA) <ji...@apache.org> wrote:
>
> [ https://issues.apache.org/jira/browse/MAHOUT-184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761501#action_12761501 ]
>
> Isabel Drost commented on MAHOUT-184:
> -------------------------------------
>
> Looks good to me. Deneche, could you please also have a look at the patch to spot any issues early on?
>
> I would prefer using CLI for the job implementation (core/src/main/java/org/apache/mahout/cf/taste/hadoop/RecommenderJob.java), but that can be done in a later patch.
>
>
>
>> Code tweaks for .df.* code
>> --------------------------
>>
>> Key: MAHOUT-184
>> URL: https://issues.apache.org/jira/browse/MAHOUT-184
>> Project: Mahout
>> Issue Type: Improvement
>> Reporter: Sean Owen
>> Assignee: Sean Owen
>> Priority: Minor
>> Fix For: 0.2
>>
>> Attachments: Tweaks_to__df__.patch
>>
>>
>> This follows on my last email to the mailing list, and code inspection. It's big enough I made a patch. No surprises I hope given the consensus on code style and practice. Might be some good takeaways in here, or points for further discussion.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
[jira] Updated: (MAHOUT-184) Code tweaks for .df.* code
Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen updated MAHOUT-184:
-----------------------------
Attachment: MAHOUT-184.patch
More stuff I'd like to change, on a related note -- in particular here I'd like to not throw Exception (except from main() or test methods), nor RuntimeException, and instead use the most appropriate standard subclass.
> Code tweaks for .df.* code
> --------------------------
>
> Key: MAHOUT-184
> URL: https://issues.apache.org/jira/browse/MAHOUT-184
> Project: Mahout
> Issue Type: Improvement
> Reporter: Sean Owen
> Assignee: Sean Owen
> Priority: Minor
> Fix For: 0.2
>
> Attachments: MAHOUT-184.patch, Tweaks_to__df__.patch
>
>
> This follows on my last email to the mailing list, and code inspection. It's big enough I made a patch. No surprises I hope given the consensus on code style and practice. Might be some good takeaways in here, or points for further discussion.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAHOUT-184) Code tweaks for .df.* code
Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen updated MAHOUT-184:
-----------------------------
Attachment: Tweaks_to__df__.patch
> Code tweaks for .df.* code
> --------------------------
>
> Key: MAHOUT-184
> URL: https://issues.apache.org/jira/browse/MAHOUT-184
> Project: Mahout
> Issue Type: Improvement
> Reporter: Sean Owen
> Assignee: Sean Owen
> Priority: Minor
> Fix For: 0.2
>
> Attachments: Tweaks_to__df__.patch
>
>
> This follows on my last email to the mailing list, and code inspection. It's big enough I made a patch. No surprises I hope given the consensus on code style and practice. Might be some good takeaways in here, or points for further discussion.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-184) Code tweaks for .df.* code
Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761819#action_12761819 ]
Deneche A. Hakim commented on MAHOUT-184:
-----------------------------------------
I'm having trouble applying the patch, got a lot of: "Hunk #1 FAILED at XX."
I inspected the patch without applying it and it look fine, although I have two comments:
* comment 1:
in core/main/.../df/node/Node.java you added the following comment:
{code}
private int node2Type() {
+ // TODO shouldn't subclasses override this method then??
{/code}
the problem is that node2type() is used when writing a Node to actually store its class in the form of an Integer. The static method Node.read(DataInput in) is used to read the node delegating the reading to the correct implementation. Overriding node2Type() without modifying Node.read(DataInput in) is a certain way to have headaches !!!
A better solution is to store the class name and use class loaders, but for now I don't really need it.
* comment 2:
in core/src/test/java/org/apache/mahout/df/data/DataLoaderTest.java you made the following change:
{code}
+ protected static void checkCategorical(double[][] source, List<Integer> missings,
Data loaded, int attr, int aId, double oValue, double nValue) {
int lind = 0;
@@ -286,7 +277,7 @@
continue;
if (source[index][attr] == oValue) {
- assertTrue(nValue == loaded.get(lind).get(aId));
+ assertEquals(nValue, loaded.get(lind).get(aId), 0.0);
{/code}
I think you meant :
{code}
+ assertEquals(nValue, loaded.get(lind).get(aId));
{/code}
> Code tweaks for .df.* code
> --------------------------
>
> Key: MAHOUT-184
> URL: https://issues.apache.org/jira/browse/MAHOUT-184
> Project: Mahout
> Issue Type: Improvement
> Reporter: Sean Owen
> Assignee: Sean Owen
> Priority: Minor
> Fix For: 0.2
>
> Attachments: Tweaks_to__df__.patch
>
>
> This follows on my last email to the mailing list, and code inspection. It's big enough I made a patch. No surprises I hope given the consensus on code style and practice. Might be some good takeaways in here, or points for further discussion.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (MAHOUT-184) Code tweaks for .df.*
code
Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761819#action_12761819 ]
Deneche A. Hakim edited comment on MAHOUT-184 at 10/2/09 11:32 PM:
-------------------------------------------------------------------
I'm having trouble applying the patch, got a lot of: "Hunk #1 FAILED at XX."
I inspected the patch without applying it and it look fine, although I have two comments:
* comment 1:
in core/main/.../df/node/Node.java you added the following comment:
{code}
private int node2Type() {
+ // TODO shouldn't subclasses override this method then??
{code}
the problem is that node2type() is used when writing a Node to actually store its class in the form of an Integer. The static method Node.read(DataInput in) is used to read the node delegating the reading to the correct implementation. Overriding node2Type() without modifying Node.read(DataInput in) is a certain way to have headaches !!!
A better solution is to store the class name and use class loaders, but for now I don't really need it.
* comment 2:
in core/src/test/java/org/apache/mahout/df/data/DataLoaderTest.java you made the following change:
{code}
+ protected static void checkCategorical(double[][] source, List<Integer> missings,
Data loaded, int attr, int aId, double oValue, double nValue) {
int lind = 0;
@@ -286,7 +277,7 @@
continue;
if (source[index][attr] == oValue) {
- assertTrue(nValue == loaded.get(lind).get(aId));
+ assertEquals(nValue, loaded.get(lind).get(aId), 0.0);
{code}
I think you meant :
{code}
+ assertEquals(nValue, loaded.get(lind).get(aId));
{code}
was (Author: adeneche):
I'm having trouble applying the patch, got a lot of: "Hunk #1 FAILED at XX."
I inspected the patch without applying it and it look fine, although I have two comments:
* comment 1:
in core/main/.../df/node/Node.java you added the following comment:
{code}
private int node2Type() {
+ // TODO shouldn't subclasses override this method then??
{/code}
the problem is that node2type() is used when writing a Node to actually store its class in the form of an Integer. The static method Node.read(DataInput in) is used to read the node delegating the reading to the correct implementation. Overriding node2Type() without modifying Node.read(DataInput in) is a certain way to have headaches !!!
A better solution is to store the class name and use class loaders, but for now I don't really need it.
* comment 2:
in core/src/test/java/org/apache/mahout/df/data/DataLoaderTest.java you made the following change:
{code}
+ protected static void checkCategorical(double[][] source, List<Integer> missings,
Data loaded, int attr, int aId, double oValue, double nValue) {
int lind = 0;
@@ -286,7 +277,7 @@
continue;
if (source[index][attr] == oValue) {
- assertTrue(nValue == loaded.get(lind).get(aId));
+ assertEquals(nValue, loaded.get(lind).get(aId), 0.0);
{/code}
I think you meant :
{code}
+ assertEquals(nValue, loaded.get(lind).get(aId));
{/code}
> Code tweaks for .df.* code
> --------------------------
>
> Key: MAHOUT-184
> URL: https://issues.apache.org/jira/browse/MAHOUT-184
> Project: Mahout
> Issue Type: Improvement
> Reporter: Sean Owen
> Assignee: Sean Owen
> Priority: Minor
> Fix For: 0.2
>
> Attachments: Tweaks_to__df__.patch
>
>
> This follows on my last email to the mailing list, and code inspection. It's big enough I made a patch. No surprises I hope given the consensus on code style and practice. Might be some good takeaways in here, or points for further discussion.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-184) Code tweaks for .df.* code
Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761854#action_12761854 ]
Sean Owen commented on MAHOUT-184:
----------------------------------
True, either way these two methods must be updated together. write()
and readFields() must be updated in parallel in the subclasses or else
things break too. It feels more object-oriented to override the method
in subclasses (purists might suggest that the Node enum itself should
have a method that returns a corresponding instance of an
implementation) but it is not something I personally feel so strongly
about.
It's the same, the last arg defaults to 0.0 anyway. Assuming they
really should be the exact same double value and can't vary due to
intermediate rounding, this is right.
> Code tweaks for .df.* code
> --------------------------
>
> Key: MAHOUT-184
> URL: https://issues.apache.org/jira/browse/MAHOUT-184
> Project: Mahout
> Issue Type: Improvement
> Reporter: Sean Owen
> Assignee: Sean Owen
> Priority: Minor
> Fix For: 0.2
>
> Attachments: Tweaks_to__df__.patch
>
>
> This follows on my last email to the mailing list, and code inspection. It's big enough I made a patch. No surprises I hope given the consensus on code style and practice. Might be some good takeaways in here, or points for further discussion.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAHOUT-184) Code tweaks for .df.* code
Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved MAHOUT-184.
------------------------------
Resolution: Fixed
> Code tweaks for .df.* code
> --------------------------
>
> Key: MAHOUT-184
> URL: https://issues.apache.org/jira/browse/MAHOUT-184
> Project: Mahout
> Issue Type: Improvement
> Reporter: Sean Owen
> Assignee: Sean Owen
> Priority: Minor
> Fix For: 0.2
>
> Attachments: MAHOUT-184.patch, Tweaks_to__df__.patch
>
>
> This follows on my last email to the mailing list, and code inspection. It's big enough I made a patch. No surprises I hope given the consensus on code style and practice. Might be some good takeaways in here, or points for further discussion.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-184) Code tweaks for .df.* code
Posted by "Deneche A. Hakim (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762089#action_12762089 ]
Deneche A. Hakim commented on MAHOUT-184:
-----------------------------------------
.df* and .ga.* changes looks good to me
but I'm still enable to apply the patch, I'm getting a lot of "Hunk...failed..." !
> Code tweaks for .df.* code
> --------------------------
>
> Key: MAHOUT-184
> URL: https://issues.apache.org/jira/browse/MAHOUT-184
> Project: Mahout
> Issue Type: Improvement
> Reporter: Sean Owen
> Assignee: Sean Owen
> Priority: Minor
> Fix For: 0.2
>
> Attachments: MAHOUT-184.patch, Tweaks_to__df__.patch
>
>
> This follows on my last email to the mailing list, and code inspection. It's big enough I made a patch. No surprises I hope given the consensus on code style and practice. Might be some good takeaways in here, or points for further discussion.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-184) Code tweaks for .df.* code
Posted by "Isabel Drost (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761501#action_12761501 ]
Isabel Drost commented on MAHOUT-184:
-------------------------------------
Looks good to me. Deneche, could you please also have a look at the patch to spot any issues early on?
I would prefer using CLI for the job implementation (core/src/main/java/org/apache/mahout/cf/taste/hadoop/RecommenderJob.java), but that can be done in a later patch.
> Code tweaks for .df.* code
> --------------------------
>
> Key: MAHOUT-184
> URL: https://issues.apache.org/jira/browse/MAHOUT-184
> Project: Mahout
> Issue Type: Improvement
> Reporter: Sean Owen
> Assignee: Sean Owen
> Priority: Minor
> Fix For: 0.2
>
> Attachments: Tweaks_to__df__.patch
>
>
> This follows on my last email to the mailing list, and code inspection. It's big enough I made a patch. No surprises I hope given the consensus on code style and practice. Might be some good takeaways in here, or points for further discussion.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.