You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Jeremy Chow (JIRA)" <ji...@apache.org> on 2008/05/11 09:25:55 UTC

[jira] Created: (MAHOUT-54) parallelize k-means sharing the predominance of canopies

parallelize k-means sharing the predominance of canopies
--------------------------------------------------------

                 Key: MAHOUT-54
                 URL: https://issues.apache.org/jira/browse/MAHOUT-54
             Project: Mahout
          Issue Type: Improvement
          Components: Clustering
    Affects Versions: 0.1
         Environment: OS Independent
            Reporter: Jeremy Chow
             Fix For: 0.1


The implementation of mahout at present only using canopy algorithm creating initial cluster centroids for k-means.  It will calculate the distance from  each center to every point while iterating. But  the most import improvement of canopies is that needs only calculating the distance from each  center to a much smaller number of points which exists in the same canopy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Commented: (MAHOUT-54) parallelize k-means sharing the predominance of canopies

Posted by Ted Dunning <te...@gmail.com>.
How large is the win for canopies in practice?  < 10 x?  < 2x?

I could imagine that since the distance computations are (should be) heavily
vector oriented that increasing the vector length by comparing to all
centroids causes a sub-linear increase in time because much of the
computation time would be involved in setting up the computation in the
first place.  Once the cluster centroids are in L1 cache, using them should
be really, really fast.

On Mon, May 12, 2008 at 9:22 PM, Jeff Eastman (JIRA) <ji...@apache.org>
wrote:

>
>    [
> https://issues.apache.org/jira/browse/MAHOUT-54?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596266#action_12596266]
>
> Jeff Eastman commented on MAHOUT-54:
> ------------------------------------
>
> What I get is you are concerned by kmeans comparing all points against all
> cluster centers in order to find the closest. Since canopy has already
> assigned each point to one or more canopies, and since the kmeans cluster
> centers are initially the canopy centers, it should only be necessary to
> measure the distance between each point's canopy cluster centers and not all
> of the cluster centers. Then, the point would only be emitted to the closest
> cluster and many distance calculations could be avoided.
>
> I'd still like to understand the changes you are proposing to the existing
> algorithms. The code in your patch does little to motivate or explain its
>  differences and indeed it breaks the existing canopy unit tests. If your
> patch were instead organized to make as few changes to the code as possible
> and if these changes were well documented it would be easier to evaluate.
> Currently, one must compare your new  implementation with the existing,
> somewhat modified implementation without the benefit of diff or any other
> documentation to see what has actually changed.
>
> It appears you wish to augment the canopy code to produce an additional
> output folder, and that kmeans would be able to utilize this folder to
> optimize its measurements. Could you say more about the structure of this
> new folder and how you intend to use it in kmeans?
>
>
>
> > parallelize k-means sharing the predominance of canopies
> > --------------------------------------------------------
> >
> >                 Key: MAHOUT-54
> >                 URL: https://issues.apache.org/jira/browse/MAHOUT-54
> >             Project: Mahout
> >          Issue Type: Improvement
> >          Components: Clustering
> >    Affects Versions: 0.1
> >         Environment: OS Independent
> >            Reporter: Jeremy Chow
> >             Fix For: 0.1
> >
> >         Attachments: canopykeams.patch
> >
> >
> > The implementation of mahout at present only using canopy algorithm
> creating initial cluster centroids for k-means.  It will calculate the
> distance from  each center to every point while iterating. But  the most
> import improvement of canopies is that needs only calculating the distance
> from each  center to a much smaller number of points which exists in the
> same canopy.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>


-- 
ted

[jira] Updated: (MAHOUT-54) parallelize k-means sharing the predominance of canopies

Posted by "Jeremy Chow (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-54?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeremy Chow updated MAHOUT-54:
------------------------------

    Attachment: canopykeams.patch

original implementation

> parallelize k-means sharing the predominance of canopies
> --------------------------------------------------------
>
>                 Key: MAHOUT-54
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-54
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.1
>         Environment: OS Independent
>            Reporter: Jeremy Chow
>             Fix For: 0.1
>
>         Attachments: canopykeams.patch
>
>
> The implementation of mahout at present only using canopy algorithm creating initial cluster centroids for k-means.  It will calculate the distance from  each center to every point while iterating. But  the most import improvement of canopies is that needs only calculating the distance from each  center to a much smaller number of points which exists in the same canopy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-54) parallelize k-means sharing the predominance of canopies

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-54?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated MAHOUT-54:
----------------------------

    Fix Version/s:     (was: 0.2)

> parallelize k-means sharing the predominance of canopies
> --------------------------------------------------------
>
>                 Key: MAHOUT-54
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-54
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.1
>         Environment: OS Independent
>            Reporter: Min Zhou
>         Attachments: canopykeams.patch
>
>
> The implementation of mahout at present only using canopy algorithm creating initial cluster centroids for k-means.  It will calculate the distance from  each center to every point while iterating. But  the most import improvement of canopies is that needs only calculating the distance from each  center to a much smaller number of points which exists in the same canopy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-54) parallelize k-means sharing the predominance of canopies

Posted by "Jeff Eastman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-54?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596266#action_12596266 ] 

Jeff Eastman commented on MAHOUT-54:
------------------------------------

What I get is you are concerned by kmeans comparing all points against all cluster centers in order to find the closest. Since canopy has already assigned each point to one or more canopies, and since the kmeans cluster centers are initially the canopy centers, it should only be necessary to measure the distance between each point's canopy cluster centers and not all of the cluster centers. Then, the point would only be emitted to the closest cluster and many distance calculations could be avoided.

I'd still like to understand the changes you are proposing to the existing algorithms. The code in your patch does little to motivate or explain its  differences and indeed it breaks the existing canopy unit tests. If your patch were instead organized to make as few changes to the code as possible and if these changes were well documented it would be easier to evaluate. Currently, one must compare your new  implementation with the existing, somewhat modified implementation without the benefit of diff or any other documentation to see what has actually changed.

It appears you wish to augment the canopy code to produce an additional output folder, and that kmeans would be able to utilize this folder to optimize its measurements. Could you say more about the structure of this new folder and how you intend to use it in kmeans?



> parallelize k-means sharing the predominance of canopies
> --------------------------------------------------------
>
>                 Key: MAHOUT-54
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-54
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.1
>         Environment: OS Independent
>            Reporter: Jeremy Chow
>             Fix For: 0.1
>
>         Attachments: canopykeams.patch
>
>
> The implementation of mahout at present only using canopy algorithm creating initial cluster centroids for k-means.  It will calculate the distance from  each center to every point while iterating. But  the most import improvement of canopies is that needs only calculating the distance from each  center to a much smaller number of points which exists in the same canopy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (MAHOUT-54) parallelize k-means sharing the predominance of canopies

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-54?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved MAHOUT-54.
-----------------------------

    Resolution: Won't Fix

It appears this one has been inactive for over a year and the patch as submitted was not agreed upon as something commitable.

> parallelize k-means sharing the predominance of canopies
> --------------------------------------------------------
>
>                 Key: MAHOUT-54
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-54
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.1
>         Environment: OS Independent
>            Reporter: Min Zhou
>         Attachments: canopykeams.patch
>
>
> The implementation of mahout at present only using canopy algorithm creating initial cluster centroids for k-means.  It will calculate the distance from  each center to every point while iterating. But  the most import improvement of canopies is that needs only calculating the distance from each  center to a much smaller number of points which exists in the same canopy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-54) parallelize k-means sharing the predominance of canopies

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-54?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll updated MAHOUT-54:
----------------------------------

    Fix Version/s:     (was: 0.1)
                   0.2

move to 0.2

> parallelize k-means sharing the predominance of canopies
> --------------------------------------------------------
>
>                 Key: MAHOUT-54
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-54
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.1
>         Environment: OS Independent
>            Reporter: Jeremy Chow
>             Fix For: 0.2
>
>         Attachments: canopykeams.patch
>
>
> The implementation of mahout at present only using canopy algorithm creating initial cluster centroids for k-means.  It will calculate the distance from  each center to every point while iterating. But  the most import improvement of canopies is that needs only calculating the distance from each  center to a much smaller number of points which exists in the same canopy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: Code Formatting was: Re: [jira] Commented: (MAHOUT-54) parallelize k-means sharing the predominance of canopies

Posted by Jeff Eastman <je...@windwardsolutions.com>.
I thought I'd seen those somewhere before, but could not find them again 
when I needed them. Thanks for pointing them out.

Jeff


Grant Ingersoll wrote:
> There are IntelliJ and Eclipse styles at the bottom of: 
> http://cwiki.apache.org/MAHOUT/howtocontribute.html
>
> Perhaps we should move them up...
>
> On May 12, 2008, at 4:05 PM, Jeff Eastman wrote:
>
>> Grant Ingersoll wrote:
>>>
>>> On May 11, 2008, at 6:07 PM, Jeff Eastman (JIRA) wrote:
>>>>
>>>> - the pretty-printing rules are not those specified by ASF: the 
>>>> Java conventions with tabs replaced by 2 spaces vs. 4 spaces. The 
>>>> patch changes several of the existing canopy files formatting 
>>>> unnecessarily
>>>> - the patch introduces @author tags which are not according to ASF 
>>>> policy. These were likely added by Eclipse but should be removed
>>>
>>> Just to be clear, these are conventions we (Mahout/Lucene) adopt, 
>>> not necessarily ASF policy.  We can always remove/clean up before 
>>> committing.  The big thing about formatting is that we should try to 
>>> keep it clean going in, but we shouldn't necessarily change it once 
>>> it is in.
>>>
>>>
>> Thanks Grant,
>>
>> This is a good clarification and I've revised my mental model 
>> accordingly. For those of us who are using Eclipse, it would be good 
>> to use the same formatter configuration so I've attached the one I 
>> use. I tend to use the pretty printer quite often when I'm writing 
>> code and this one seems to leave the existing files intact. If folks 
>> are ok with it, I can post it to the wiki.
>>
>> Jeff
>> <?xml version="1.0" encoding="UTF-8"?>
>> <profiles version="11">
>> <profile kind="CodeFormatterProfile" name="Apache Conventions" 
>> version="11">
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_closing_paren_in_if" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_colon_in_assert" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.brace_position_for_enum_constant" 
>> value="end_of_line"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_semicolon" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.align_type_members_on_columns" 
>> value="false"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_colon_in_case" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.comment.format_line_comments" 
>> value="true"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.number_of_empty_lines_to_preserve" 
>> value="1"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_between_brackets_in_array_type_reference" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_opening_paren_in_switch" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.blank_lines_between_type_declarations" 
>> value="1"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_parenthesized_expression_in_return" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_new_line_in_empty_method_body" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_opening_brace_in_annotation_type_declaration" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.indent_statements_compare_to_body" 
>> value="true"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_new_line_after_opening_brace_in_array_initializer" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.format_guardian_clause_on_one_line" 
>> value="false"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.comment.insert_new_line_before_root_tags" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_colon_in_for" 
>> value="insert"/>
>> <setting id="org.eclipse.jdt.core.formatter.tabulation.size" value="2"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_opening_angle_bracket_in_type_parameters" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.blank_lines_before_imports" 
>> value="1"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_colon_in_case" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_comma_in_enum_constant_arguments" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.blank_lines_before_new_chunk" 
>> value="1"/>
>> <setting id="org.eclipse.jdt.core.formatter.continuation_indentation" 
>> value="2"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_binary_operator" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_comma_in_constructor_declaration_parameters" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_opening_paren_in_for" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_comma_in_superinterfaces" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.alignment_for_parameters_in_method_declaration" 
>> value="16"/>
>> <setting id="org.eclipse.jdt.core.formatter.alignment_for_assignment" 
>> value="0"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.blank_lines_before_member_type" 
>> value="1"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_comma_in_constructor_declaration_throws" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.alignment_for_conditional_expression" 
>> value="80"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_opening_paren_in_while" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.comment.indent_parameter_description" 
>> value="true"/>
>> <setting id="org.eclipse.jdt.core.formatter.comment.format_html" 
>> value="true"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_comma_in_allocation_expression" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_comma_in_method_declaration_throws" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_closing_paren_in_enum_constant" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.comment.format_source_code" 
>> value="true"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_comma_in_enum_declarations" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_closing_angle_bracket_in_parameterized_type_reference" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_closing_paren_in_annotation" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_between_empty_parens_in_method_declaration" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_colon_in_conditional" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_unary_operator" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_question_in_conditional" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_new_line_in_empty_annotation_declaration" 
>> value="insert"/>
>> <setting id="org.eclipse.jdt.core.formatter.indentation.size" 
>> value="4"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_comma_in_multiple_local_declarations" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_postfix_operator" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.alignment_for_superinterfaces_in_enum_declaration" 
>> value="16"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_comma_in_enum_constant_arguments" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_semicolon_in_for" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_closing_paren_in_constructor_declaration" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_at_in_annotation_type_declaration" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_comma_in_explicitconstructorcall_arguments" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.brace_position_for_anonymous_type_declaration" 
>> value="end_of_line"/>
>> <setting id="org.eclipse.jdt.core.formatter.lineSplit" value="80"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_opening_brace_in_type_declaration" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_opening_brace_in_block" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_new_line_in_empty_type_declaration" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_comma_in_method_invocation_arguments" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_closing_paren_in_while" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_opening_brace_in_enum_constant" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.comment.clear_blank_lines_in_block_comment" 
>> value="false"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_at_in_annotation_type_declaration" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_new_line_in_empty_enum_constant" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_opening_angle_bracket_in_type_arguments" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_opening_angle_bracket_in_type_parameters" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_new_line_before_closing_brace_in_array_initializer" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.brace_position_for_array_initializer" 
>> value="end_of_line"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.alignment_for_superclass_in_type_declaration" 
>> value="16"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_opening_paren_in_cast" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_new_line_in_empty_enum_declaration" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_closing_paren_in_synchronized" 
>> value="do not insert"/>
>> <setting id="org.eclipse.jdt.core.formatter.comment.format_header" 
>> value="false"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_colon_in_for" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_at_in_annotation" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_new_line_before_else_in_if_statement" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.alignment_for_arguments_in_explicit_constructor_call" 
>> value="16"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_opening_paren_in_method_declaration" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_comma_in_allocation_expression" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.alignment_for_multiple_fields" 
>> value="16"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_new_line_at_end_of_file_if_missing" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_comma_in_explicitconstructorcall_arguments" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_new_line_in_empty_block" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_closing_paren_in_cast" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_new_line_before_finally_in_try_statement" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.keep_then_statement_on_same_line" 
>> value="false"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_binary_operator" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.indent_body_declarations_compare_to_annotation_declaration_header" 
>> value="true"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_opening_brace_in_constructor_declaration" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_closing_paren_in_method_declaration" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.alignment_for_expressions_in_array_initializer" 
>> value="16"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_comma_in_method_declaration_parameters" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.brace_position_for_method_declaration" 
>> value="end_of_line"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_opening_paren_in_enum_constant" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_between_empty_parens_in_annotation_type_member_declaration" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_opening_angle_bracket_in_type_arguments" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_opening_angle_bracket_in_parameterized_type_reference" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_opening_paren_in_annotation_type_member_declaration" 
>> value="do not insert"/>
>> <setting id="org.eclipse.jdt.core.formatter.blank_lines_before_field" 
>> value="0"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.alignment_for_throws_clause_in_method_declaration" 
>> value="16"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_opening_paren_in_method_declaration" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_comma_in_constructor_declaration_parameters" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_comma_in_type_parameters" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_opening_paren_in_switch" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.comment.format_javadoc_comments" 
>> value="true"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_opening_bracket_in_array_allocation_expression" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_opening_paren_in_annotation" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.comment.format_block_comments" 
>> value="true"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_opening_brace_in_array_initializer" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_new_line_in_empty_anonymous_type_declaration" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.alignment_for_binary_expression" 
>> value="16"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_between_empty_braces_in_array_initializer" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.wrap_before_binary_operator" 
>> value="true"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.blank_lines_after_package" 
>> value="1"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_opening_paren_in_catch" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.alignment_for_superinterfaces_in_type_declaration" 
>> value="16"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_colon_in_labeled_statement" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_semicolon_in_for" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_and_in_type_parameter" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_opening_paren_in_catch" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_new_line_before_while_in_do_statement" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.blank_lines_between_import_groups" 
>> value="1"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_comma_in_method_declaration_throws" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_prefix_operator" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_ellipsis" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.brace_position_for_constructor_declaration" 
>> value="end_of_line"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_question_in_wildcard" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.comment.clear_blank_lines_in_javadoc_comment" 
>> value="false"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.alignment_for_arguments_in_allocation_expression" 
>> value="16"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_comma_in_type_parameters" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.blank_lines_after_imports" 
>> value="1"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_colon_in_conditional" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_opening_brace_in_enum_declaration" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_comma_in_parameterized_type_reference" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_new_line_before_catch_in_try_statement" 
>> value="do not insert"/>
>> <setting id="org.eclipse.jdt.core.compiler.problem.assertIdentifier" 
>> value="error"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.alignment_for_arguments_in_enum_constant" 
>> value="16"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.brace_position_for_block_in_case" 
>> value="end_of_line"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.brace_position_for_enum_declaration" 
>> value="end_of_line"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_comma_in_for_increments" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_opening_paren_in_for" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.blank_lines_before_first_class_body_declaration" 
>> value="0"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.keep_else_statement_on_same_line" 
>> value="false"/>
>> <setting id="org.eclipse.jdt.core.formatter.indent_empty_lines" 
>> value="false"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.comment.insert_new_line_for_parameter" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_parenthesized_expression_in_throw" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_opening_paren_in_while" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_closing_brace_in_block" 
>> value="insert"/>
>> <setting id="org.eclipse.jdt.core.compiler.source" value="1.5"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_comma_in_for_increments" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.indent_body_declarations_compare_to_enum_declaration_header" 
>> value="true"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_between_empty_parens_in_constructor_declaration" 
>> value="do not insert"/>
>> <setting id="org.eclipse.jdt.core.formatter.comment.line_length" 
>> value="80"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_prefix_operator" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.brace_position_for_type_declaration" 
>> value="end_of_line"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_assignment_operator" 
>> value="insert"/>
>> <setting id="org.eclipse.jdt.core.compiler.compliance" value="1.5"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_opening_paren_in_method_invocation" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_closing_angle_bracket_in_type_arguments" 
>> value="do not insert"/>
>> <setting id="org.eclipse.jdt.core.formatter.compact_else_if" 
>> value="true"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_opening_bracket_in_array_reference" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_comma_in_enum_declarations" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_question_in_conditional" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_closing_angle_bracket_in_type_parameters" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_opening_paren_in_method_invocation" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.use_tabs_only_for_leading_indentations" 
>> value="false"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_comma_in_type_arguments" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.brace_position_for_switch" 
>> value="end_of_line"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.alignment_for_parameters_in_constructor_declaration" 
>> value="16"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_between_empty_brackets_in_array_allocation_expression" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_closing_paren_in_for" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_opening_paren_in_constructor_declaration" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_opening_paren_in_synchronized" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.number_of_blank_lines_at_beginning_of_method_body" 
>> value="0"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_comma_in_annotation" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_opening_angle_bracket_in_parameterized_type_reference" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.indent_switchstatements_compare_to_switch" 
>> value="false"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_opening_paren_in_constructor_declaration" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_opening_paren_in_annotation" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_opening_paren_in_if" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_colon_in_default" 
>> value="do not insert"/>
>> <setting id="org.eclipse.jdt.core.compiler.problem.enumIdentifier" 
>> value="error"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_comma_in_annotation" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_between_empty_parens_in_enum_constant" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_opening_bracket_in_array_reference" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_opening_bracket_in_array_type_reference" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_comma_in_array_initializer" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_closing_paren_in_catch" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_opening_paren_in_synchronized" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.keep_empty_array_initializer_on_one_line" 
>> value="false"/>
>> <setting id="org.eclipse.jdt.core.compiler.codegen.targetPlatform" 
>> value="1.5"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_closing_bracket_in_array_reference" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_closing_paren_in_switch" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_closing_brace_in_array_initializer" 
>> value="insert"/>
>> <setting id="org.eclipse.jdt.core.formatter.alignment_for_compact_if" 
>> value="16"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_question_in_wildcard" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_colon_in_assert" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_opening_brace_in_method_declaration" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_ellipsis" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.alignment_for_arguments_in_qualified_allocation_expression" 
>> value="16"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.indent_statements_compare_to_block" 
>> value="true"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_opening_paren_in_parenthesized_expression" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.indent_body_declarations_compare_to_type_header" 
>> value="true"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_closing_angle_bracket_in_type_parameters" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_comma_in_type_arguments" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.keep_imple_if_on_one_line" 
>> value="false"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_between_empty_parens_in_method_invocation" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_comma_in_multiple_local_declarations" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.brace_position_for_annotation_type_declaration" 
>> value="end_of_line"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.alignment_for_selector_in_method_invocation" 
>> value="16"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.indent_body_declarations_compare_to_enum_constant_header" 
>> value="true"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_opening_brace_in_switch" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_assignment_operator" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.never_indent_line_comments_on_first_column" 
>> value="false"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_unary_operator" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_opening_paren_in_if" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_colon_in_labeled_statement" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_opening_bracket_in_array_allocation_expression" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.indent_switchstatements_compare_to_cases" 
>> value="true"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.continuation_indentation_for_array_initializer" 
>> value="2"/>
>> <setting id="org.eclipse.jdt.core.formatter.comment.indent_root_tags" 
>> value="true"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.alignment_for_enum_constants" 
>> value="0"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_comma_in_parameterized_type_reference" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_closing_paren_in_parenthesized_expression" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_comma_in_constructor_declaration_throws" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.alignment_for_throws_clause_in_constructor_declaration" 
>> value="16"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.alignment_for_arguments_in_method_invocation" 
>> value="16"/>
>> <setting id="org.eclipse.jdt.core.formatter.tabulation.char" 
>> value="space"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.blank_lines_before_package" 
>> value="0"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_comma_in_method_invocation_arguments" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.indent_breaks_compare_to_cases" 
>> value="true"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_comma_in_for_inits" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_comma_in_multiple_field_declarations" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_comma_in_superinterfaces" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.put_empty_statement_on_new_line" 
>> value="true"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_comma_in_method_declaration_parameters" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.blank_lines_before_method" 
>> value="1"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_comma_in_multiple_field_declarations" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_comma_in_for_inits" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_opening_brace_in_anonymous_type_declaration" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_new_line_after_annotation" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_closing_paren_in_cast" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_closing_angle_bracket_in_type_arguments" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.never_indent_block_comments_on_first_column" 
>> value="false"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_comma_in_array_initializer" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_opening_paren_in_parenthesized_expression" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_opening_paren_in_enum_constant" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_closing_paren_in_method_invocation" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_postfix_operator" 
>> value="do not insert"/>
>> <setting id="org.eclipse.jdt.core.formatter.brace_position_for_block" 
>> value="end_of_line"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_after_opening_brace_in_array_initializer" 
>> value="insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_closing_bracket_in_array_allocation_expression" 
>> value="do not insert"/>
>> <setting 
>> id="org.eclipse.jdt.core.formatter.insert_space_before_and_in_type_parameter" 
>> value="insert"/>
>> </profile>
>> </profiles>
>> <jeastman.vcf>
>
> --------------------------
> Grant Ingersoll
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
>
>


Code Formatting was: Re: [jira] Commented: (MAHOUT-54) parallelize k-means sharing the predominance of canopies

Posted by Grant Ingersoll <gs...@apache.org>.
There are IntelliJ and Eclipse styles at the bottom of: http://cwiki.apache.org/MAHOUT/howtocontribute.html

Perhaps we should move them up...

On May 12, 2008, at 4:05 PM, Jeff Eastman wrote:

> Grant Ingersoll wrote:
>>
>> On May 11, 2008, at 6:07 PM, Jeff Eastman (JIRA) wrote:
>>>
>>> - the pretty-printing rules are not those specified by ASF: the  
>>> Java conventions with tabs replaced by 2 spaces vs. 4 spaces. The  
>>> patch changes several of the existing canopy files formatting  
>>> unnecessarily
>>> - the patch introduces @author tags which are not according to ASF  
>>> policy. These were likely added by Eclipse but should be removed
>>
>> Just to be clear, these are conventions we (Mahout/Lucene) adopt,  
>> not necessarily ASF policy.  We can always remove/clean up before  
>> committing.  The big thing about formatting is that we should try  
>> to keep it clean going in, but we shouldn't necessarily change it  
>> once it is in.
>>
>>
> Thanks Grant,
>
> This is a good clarification and I've revised my mental model  
> accordingly. For those of us who are using Eclipse, it would be good  
> to use the same formatter configuration so I've attached the one I  
> use. I tend to use the pretty printer quite often when I'm writing  
> code and this one seems to leave the existing files intact. If folks  
> are ok with it, I can post it to the wiki.
>
> Jeff
> <?xml version="1.0" encoding="UTF-8"?>
> <profiles version="11">
> <profile kind="CodeFormatterProfile" name="Apache Conventions"  
> version="11">
> <setting  
> id 
> = 
> "org 
> .eclipse.jdt.core.formatter.insert_space_before_closing_paren_in_if"  
> value="do not insert"/>
> <setting  
> id 
> ="org.eclipse.jdt.core.formatter.insert_space_after_colon_in_assert"  
> value="insert"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.brace_position_for_enum_constant"  
> value="end_of_line"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.insert_space_before_semicolon"  
> value="do not insert"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.align_type_members_on_columns"  
> value="false"/>
> <setting  
> id 
> ="org.eclipse.jdt.core.formatter.insert_space_before_colon_in_case"  
> value="do not insert"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.comment.format_line_comments"  
> value="true"/>
> <setting  
> id 
> ="org.eclipse.jdt.core.formatter.number_of_empty_lines_to_preserve"  
> value="1"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.insert_space_between_brackets_in_array_type_reference"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_after_opening_paren_in_switch"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse.jdt.core.formatter.blank_lines_between_type_declarations"  
> value="1"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.insert_space_before_parenthesized_expression_in_return"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse.jdt.core.formatter.insert_new_line_in_empty_method_body"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_before_opening_brace_in_annotation_type_declaration"  
> value="insert"/>
> <setting  
> id 
> ="org.eclipse.jdt.core.formatter.indent_statements_compare_to_body"  
> value="true"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.insert_new_line_after_opening_brace_in_array_initializer"  
> value="do not insert"/>
> <setting  
> id 
> ="org.eclipse.jdt.core.formatter.format_guardian_clause_on_one_line"  
> value="false"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.comment.insert_new_line_before_root_tags"  
> value="insert"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.insert_space_after_colon_in_for"  
> value="insert"/>
> <setting id="org.eclipse.jdt.core.formatter.tabulation.size"  
> value="2"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_after_opening_angle_bracket_in_type_parameters"  
> value="do not insert"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.blank_lines_before_imports"  
> value="1"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.insert_space_after_colon_in_case"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.insert_space_before_comma_in_enum_constant_arguments"  
> value="do not insert"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.blank_lines_before_new_chunk"  
> value="1"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.continuation_indentation"  
> value="2"/>
> <setting  
> id 
> = 
> "org.eclipse.jdt.core.formatter.insert_space_before_binary_operator"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_after_comma_in_constructor_declaration_parameters"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse.jdt.core.formatter.insert_space_after_opening_paren_in_for"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_before_comma_in_superinterfaces"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.alignment_for_parameters_in_method_declaration"  
> value="16"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.alignment_for_assignment"  
> value="0"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.blank_lines_before_member_type"  
> value="1"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_before_comma_in_constructor_declaration_throws"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse.jdt.core.formatter.alignment_for_conditional_expression"  
> value="80"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_before_opening_paren_in_while"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse.jdt.core.formatter.comment.indent_parameter_description"  
> value="true"/>
> <setting id="org.eclipse.jdt.core.formatter.comment.format_html"  
> value="true"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core.formatter.insert_space_after_comma_in_allocation_expression"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.insert_space_after_comma_in_method_declaration_throws"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core.formatter.insert_space_before_closing_paren_in_enum_constant"  
> value="do not insert"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.comment.format_source_code"  
> value="true"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_after_comma_in_enum_declarations"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_before_closing_angle_bracket_in_parameterized_type_reference 
> " value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_before_closing_paren_in_annotation"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.insert_space_between_empty_parens_in_method_declaration"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_before_colon_in_conditional"  
> value="insert"/>
> <setting  
> id 
> ="org.eclipse.jdt.core.formatter.insert_space_before_unary_operator"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_after_question_in_conditional"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_new_line_in_empty_annotation_declaration"  
> value="insert"/>
> <setting id="org.eclipse.jdt.core.formatter.indentation.size"  
> value="4"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.insert_space_after_comma_in_multiple_local_declarations"  
> value="insert"/>
> <setting  
> id 
> = 
> "org.eclipse.jdt.core.formatter.insert_space_after_postfix_operator"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core.formatter.alignment_for_superinterfaces_in_enum_declaration"  
> value="16"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core.formatter.insert_space_after_comma_in_enum_constant_arguments"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse.jdt.core.formatter.insert_space_before_semicolon_in_for"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_before_closing_paren_in_constructor_declaration"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.insert_space_after_at_in_annotation_type_declaration"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_after_comma_in_explicitconstructorcall_arguments"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.brace_position_for_anonymous_type_declaration"  
> value="end_of_line"/>
> <setting id="org.eclipse.jdt.core.formatter.lineSplit" value="80"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.insert_space_before_opening_brace_in_type_declaration"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_before_opening_brace_in_block"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_new_line_in_empty_type_declaration"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.insert_space_before_comma_in_method_invocation_arguments"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_before_closing_paren_in_while"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core.formatter.insert_space_before_opening_brace_in_enum_constant"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.comment.clear_blank_lines_in_block_comment"  
> value="false"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.insert_space_before_at_in_annotation_type_declaration"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse.jdt.core.formatter.insert_new_line_in_empty_enum_constant"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_after_opening_angle_bracket_in_type_arguments"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_before_opening_angle_bracket_in_type_parameters"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_new_line_before_closing_brace_in_array_initializer"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse.jdt.core.formatter.brace_position_for_array_initializer"  
> value="end_of_line"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.alignment_for_superclass_in_type_declaration"  
> value="16"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_after_opening_paren_in_cast"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_new_line_in_empty_enum_declaration"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core.formatter.insert_space_before_closing_paren_in_synchronized"  
> value="do not insert"/>
> <setting id="org.eclipse.jdt.core.formatter.comment.format_header"  
> value="false"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.insert_space_before_colon_in_for"  
> value="insert"/>
> <setting  
> id 
> = 
> "org.eclipse.jdt.core.formatter.insert_space_after_at_in_annotation"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_new_line_before_else_in_if_statement"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.alignment_for_arguments_in_explicit_constructor_call"  
> value="16"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.insert_space_after_opening_paren_in_method_declaration"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core.formatter.insert_space_before_comma_in_allocation_expression"  
> value="do not insert"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.alignment_for_multiple_fields"  
> value="16"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_new_line_at_end_of_file_if_missing"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_before_comma_in_explicitconstructorcall_arguments"  
> value="do not insert"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.insert_new_line_in_empty_block"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_after_closing_paren_in_cast"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_new_line_before_finally_in_try_statement"  
> value="do not insert"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.keep_then_statement_on_same_line"  
> value="false"/>
> <setting  
> id 
> ="org.eclipse.jdt.core.formatter.insert_space_after_binary_operator"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .indent_body_declarations_compare_to_annotation_declaration_header"  
> value="true"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_before_opening_brace_in_constructor_declaration"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.insert_space_before_closing_paren_in_method_declaration"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.alignment_for_expressions_in_array_initializer"  
> value="16"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_before_comma_in_method_declaration_parameters"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse.jdt.core.formatter.brace_position_for_method_declaration"  
> value="end_of_line"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core.formatter.insert_space_after_opening_paren_in_enum_constant"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_between_empty_parens_in_annotation_type_member_declaration 
> " value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_before_opening_angle_bracket_in_type_arguments"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_after_opening_angle_bracket_in_parameterized_type_reference 
> " value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_before_opening_paren_in_annotation_type_member_declaration 
> " value="do not insert"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.blank_lines_before_field"  
> value="0"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core.formatter.alignment_for_throws_clause_in_method_declaration"  
> value="16"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.insert_space_before_opening_paren_in_method_declaration"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_before_comma_in_constructor_declaration_parameters"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_after_comma_in_type_parameters"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_before_opening_paren_in_switch"  
> value="insert"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.comment.format_javadoc_comments"  
> value="true"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_after_opening_bracket_in_array_allocation_expression"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_before_opening_paren_in_annotation"  
> value="do not insert"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.comment.format_block_comments"  
> value="true"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.insert_space_before_opening_brace_in_array_initializer"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core.formatter.insert_new_line_in_empty_anonymous_type_declaration"  
> value="insert"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.alignment_for_binary_expression"  
> value="16"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.insert_space_between_empty_braces_in_array_initializer"  
> value="do not insert"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.wrap_before_binary_operator"  
> value="true"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.blank_lines_after_package"  
> value="1"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_before_opening_paren_in_catch"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core.formatter.alignment_for_superinterfaces_in_type_declaration"  
> value="16"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_before_colon_in_labeled_statement"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org.eclipse.jdt.core.formatter.insert_space_after_semicolon_in_for"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_after_and_in_type_parameter"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_after_opening_paren_in_catch"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_new_line_before_while_in_do_statement"  
> value="do not insert"/>
> <setting  
> id 
> ="org.eclipse.jdt.core.formatter.blank_lines_between_import_groups"  
> value="1"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.insert_space_before_comma_in_method_declaration_throws"  
> value="do not insert"/>
> <setting  
> id 
> ="org.eclipse.jdt.core.formatter.insert_space_after_prefix_operator"  
> value="do not insert"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.insert_space_before_ellipsis"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.brace_position_for_constructor_declaration"  
> value="end_of_line"/>
> <setting  
> id 
> = 
> "org 
> .eclipse.jdt.core.formatter.insert_space_after_question_in_wildcard"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.comment.clear_blank_lines_in_javadoc_comment"  
> value="false"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core.formatter.alignment_for_arguments_in_allocation_expression"  
> value="16"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_before_comma_in_type_parameters"  
> value="do not insert"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.blank_lines_after_imports"  
> value="1"/>
> <setting  
> id 
> = 
> "org 
> .eclipse.jdt.core.formatter.insert_space_after_colon_in_conditional"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.insert_space_before_opening_brace_in_enum_declaration"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.insert_space_after_comma_in_parameterized_type_reference"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_new_line_before_catch_in_try_statement"  
> value="do not insert"/>
> <setting id="org.eclipse.jdt.core.compiler.problem.assertIdentifier"  
> value="error"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.alignment_for_arguments_in_enum_constant"  
> value="16"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.brace_position_for_block_in_case"  
> value="end_of_line"/>
> <setting  
> id 
> = 
> "org.eclipse.jdt.core.formatter.brace_position_for_enum_declaration"  
> value="end_of_line"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_before_comma_in_for_increments"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_before_opening_paren_in_for"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.blank_lines_before_first_class_body_declaration"  
> value="0"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.keep_else_statement_on_same_line"  
> value="false"/>
> <setting id="org.eclipse.jdt.core.formatter.indent_empty_lines"  
> value="false"/>
> <setting  
> id 
> = 
> "org 
> .eclipse.jdt.core.formatter.comment.insert_new_line_for_parameter"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.insert_space_before_parenthesized_expression_in_throw"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_after_opening_paren_in_while"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_after_closing_brace_in_block"  
> value="insert"/>
> <setting id="org.eclipse.jdt.core.compiler.source" value="1.5"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_after_comma_in_for_increments"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .indent_body_declarations_compare_to_enum_declaration_header"  
> value="true"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_between_empty_parens_in_constructor_declaration"  
> value="do not insert"/>
> <setting id="org.eclipse.jdt.core.formatter.comment.line_length"  
> value="80"/>
> <setting  
> id 
> = 
> "org.eclipse.jdt.core.formatter.insert_space_before_prefix_operator"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org.eclipse.jdt.core.formatter.brace_position_for_type_declaration"  
> value="end_of_line"/>
> <setting  
> id 
> = 
> "org 
> .eclipse.jdt.core.formatter.insert_space_after_assignment_operator"  
> value="insert"/>
> <setting id="org.eclipse.jdt.core.compiler.compliance" value="1.5"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.insert_space_before_opening_paren_in_method_invocation"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_before_closing_angle_bracket_in_type_arguments"  
> value="do not insert"/>
> <setting id="org.eclipse.jdt.core.formatter.compact_else_if"  
> value="true"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.insert_space_before_opening_bracket_in_array_reference"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_before_comma_in_enum_declarations"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_before_question_in_conditional"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_before_closing_angle_bracket_in_type_parameters"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.insert_space_after_opening_paren_in_method_invocation"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse.jdt.core.formatter.use_tabs_only_for_leading_indentations"  
> value="false"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_after_comma_in_type_arguments"  
> value="insert"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.brace_position_for_switch"  
> value="end_of_line"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core.formatter.alignment_for_parameters_in_constructor_declaration"  
> value="16"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_between_empty_brackets_in_array_allocation_expression"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_before_closing_paren_in_for"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_before_opening_paren_in_constructor_declaration"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core.formatter.insert_space_before_opening_paren_in_synchronized"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core.formatter.number_of_blank_lines_at_beginning_of_method_body"  
> value="0"/>
> <setting  
> id 
> = 
> "org 
> .eclipse.jdt.core.formatter.insert_space_after_comma_in_annotation"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_before_opening_angle_bracket_in_parameterized_type_reference 
> " value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.indent_switchstatements_compare_to_switch"  
> value="false"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_after_opening_paren_in_constructor_declaration"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_after_opening_paren_in_annotation"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse.jdt.core.formatter.insert_space_after_opening_paren_in_if"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse.jdt.core.formatter.insert_space_before_colon_in_default"  
> value="do not insert"/>
> <setting id="org.eclipse.jdt.core.compiler.problem.enumIdentifier"  
> value="error"/>
> <setting  
> id 
> = 
> "org 
> .eclipse.jdt.core.formatter.insert_space_before_comma_in_annotation"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core.formatter.insert_space_between_empty_parens_in_enum_constant"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.insert_space_after_opening_bracket_in_array_reference"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_before_opening_bracket_in_array_type_reference"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_after_comma_in_array_initializer"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_before_closing_paren_in_catch"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core.formatter.insert_space_after_opening_paren_in_synchronized"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.keep_empty_array_initializer_on_one_line"  
> value="false"/>
> <setting id="org.eclipse.jdt.core.compiler.codegen.targetPlatform"  
> value="1.5"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.insert_space_before_closing_bracket_in_array_reference"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_before_closing_paren_in_switch"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.insert_space_before_closing_brace_in_array_initializer"  
> value="insert"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.alignment_for_compact_if"  
> value="16"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_before_question_in_wildcard"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org.eclipse.jdt.core.formatter.insert_space_before_colon_in_assert"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.insert_space_before_opening_brace_in_method_declaration"  
> value="insert"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.insert_space_after_ellipsis"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .alignment_for_arguments_in_qualified_allocation_expression"  
> value="16"/>
> <setting  
> id 
> ="org.eclipse.jdt.core.formatter.indent_statements_compare_to_block"  
> value="true"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_after_opening_paren_in_parenthesized_expression"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.indent_body_declarations_compare_to_type_header"  
> value="true"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_after_closing_angle_bracket_in_type_parameters"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_before_comma_in_type_arguments"  
> value="do not insert"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.keep_imple_if_on_one_line"  
> value="false"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.insert_space_between_empty_parens_in_method_invocation"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.insert_space_before_comma_in_multiple_local_declarations"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.brace_position_for_annotation_type_declaration"  
> value="end_of_line"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.alignment_for_selector_in_method_invocation"  
> value="16"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.indent_body_declarations_compare_to_enum_constant_header"  
> value="true"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_before_opening_brace_in_switch"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse.jdt.core.formatter.insert_space_before_assignment_operator"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.never_indent_line_comments_on_first_column"  
> value="false"/>
> <setting  
> id 
> ="org.eclipse.jdt.core.formatter.insert_space_after_unary_operator"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse.jdt.core.formatter.insert_space_before_opening_paren_in_if"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_after_colon_in_labeled_statement"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_before_opening_bracket_in_array_allocation_expression"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.indent_switchstatements_compare_to_cases"  
> value="true"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.continuation_indentation_for_array_initializer"  
> value="2"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.comment.indent_root_tags"  
> value="true"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.alignment_for_enum_constants"  
> value="0"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_before_comma_in_parameterized_type_reference"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_before_closing_paren_in_parenthesized_expression"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_after_comma_in_constructor_declaration_throws"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.alignment_for_throws_clause_in_constructor_declaration"  
> value="16"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.alignment_for_arguments_in_method_invocation"  
> value="16"/>
> <setting id="org.eclipse.jdt.core.formatter.tabulation.char"  
> value="space"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.blank_lines_before_package"  
> value="0"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.insert_space_after_comma_in_method_invocation_arguments"  
> value="insert"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.indent_breaks_compare_to_cases"  
> value="true"/>
> <setting  
> id 
> = 
> "org 
> .eclipse.jdt.core.formatter.insert_space_before_comma_in_for_inits"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.insert_space_before_comma_in_multiple_field_declarations"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_after_comma_in_superinterfaces"  
> value="insert"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.put_empty_statement_on_new_line"  
> value="true"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_after_comma_in_method_declaration_parameters"  
> value="insert"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.blank_lines_before_method"  
> value="1"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.insert_space_after_comma_in_multiple_field_declarations"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse.jdt.core.formatter.insert_space_after_comma_in_for_inits"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_before_opening_brace_in_anonymous_type_declaration"  
> value="insert"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.insert_new_line_after_annotation"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_before_closing_paren_in_cast"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_after_closing_angle_bracket_in_type_arguments"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.never_indent_block_comments_on_first_column"  
> value="false"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_before_comma_in_array_initializer"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_before_opening_paren_in_parenthesized_expression"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core.formatter.insert_space_before_opening_paren_in_enum_constant"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.insert_space_before_closing_paren_in_method_invocation"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse.jdt.core.formatter.insert_space_before_postfix_operator"  
> value="do not insert"/>
> <setting  
> id="org.eclipse.jdt.core.formatter.brace_position_for_block"  
> value="end_of_line"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter.insert_space_after_opening_brace_in_array_initializer"  
> value="insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt 
> .core 
> .formatter 
> .insert_space_before_closing_bracket_in_array_allocation_expression"  
> value="do not insert"/>
> <setting  
> id 
> = 
> "org 
> .eclipse 
> .jdt.core.formatter.insert_space_before_and_in_type_parameter"  
> value="insert"/>
> </profile>
> </profiles>
> <jeastman.vcf>

--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ







Re: [jira] Commented: (MAHOUT-54) parallelize k-means sharing the predominance of canopies

Posted by Jeff Eastman <je...@windwardsolutions.com>.
Grant Ingersoll wrote:
>
> On May 11, 2008, at 6:07 PM, Jeff Eastman (JIRA) wrote:
>>
>> - the pretty-printing rules are not those specified by ASF: the Java 
>> conventions with tabs replaced by 2 spaces vs. 4 spaces. The patch 
>> changes several of the existing canopy files formatting unnecessarily
>> - the patch introduces @author tags which are not according to ASF 
>> policy. These were likely added by Eclipse but should be removed
>
> Just to be clear, these are conventions we (Mahout/Lucene) adopt, not 
> necessarily ASF policy.  We can always remove/clean up before 
> committing.  The big thing about formatting is that we should try to 
> keep it clean going in, but we shouldn't necessarily change it once it 
> is in.
>
>
Thanks Grant,

This is a good clarification and I've revised my mental model 
accordingly. For those of us who are using Eclipse, it would be good to 
use the same formatter configuration so I've attached the one I use. I 
tend to use the pretty printer quite often when I'm writing code and 
this one seems to leave the existing files intact. If folks are ok with 
it, I can post it to the wiki.

Jeff

Re: [jira] Commented: (MAHOUT-54) parallelize k-means sharing the predominance of canopies

Posted by Grant Ingersoll <gs...@apache.org>.
On May 11, 2008, at 6:07 PM, Jeff Eastman (JIRA) wrote:
>
> - the pretty-printing rules are not those specified by ASF: the Java  
> conventions with tabs replaced by 2 spaces vs. 4 spaces. The patch  
> changes several of the existing canopy files formatting unnecessarily
> - the patch introduces @author tags which are not according to ASF  
> policy. These were likely added by Eclipse but should be removed

Just to be clear, these are conventions we (Mahout/Lucene) adopt, not  
necessarily ASF policy.  We can always remove/clean up before  
committing.  The big thing about formatting is that we should try to  
keep it clean going in, but we shouldn't necessarily change it once it  
is in.

[jira] Commented: (MAHOUT-54) parallelize k-means sharing the predominance of canopies

Posted by "Jeff Eastman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-54?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12595941#action_12595941 ] 

Jeff Eastman commented on MAHOUT-54:
------------------------------------

I downloaded this patch and it installed cleanly, but I have several concerns about it:

- the patch introduces an entirely new canopykmeans package without much motivation. In particular, it is not clear what improvements it is suggesting for either canopy or kmeans
- there are no unit tests included that would indicate that the code produces correct results
- the pretty-printing rules are not those specified by ASF: the Java conventions with tabs replaced by 2 spaces vs. 4 spaces. The patch changes several of the existing canopy files formatting unnecessarily
- the patch introduces @author tags which are not according to ASF policy. These were likely added by Eclipse but should be removed

I would prefer to understand the logic changes which are being suggested first, then see a minimal patch to introduce such changes. This patch introduces an entirely new implementation that is derived from the original version, but cannot be easily compared with it. And, it has no associated tests.

I'm interested in understanding if logic improvements to either canopy or kmeans can be made, but from this patch it is too difficult to understand what is being proposed. Could you please try to be a little more systematic?

> parallelize k-means sharing the predominance of canopies
> --------------------------------------------------------
>
>                 Key: MAHOUT-54
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-54
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.1
>         Environment: OS Independent
>            Reporter: Jeremy Chow
>             Fix For: 0.1
>
>         Attachments: canopykeams.patch
>
>
> The implementation of mahout at present only using canopy algorithm creating initial cluster centroids for k-means.  It will calculate the distance from  each center to every point while iterating. But  the most import improvement of canopies is that needs only calculating the distance from each  center to a much smaller number of points which exists in the same canopy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-54) parallelize k-means sharing the predominance of canopies

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-54?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12625716#action_12625716 ] 

Grant Ingersoll commented on MAHOUT-54:
---------------------------------------

Jeremy, 

Any response on this?  Would be good to get some resolution.

Thanks,
Grant

> parallelize k-means sharing the predominance of canopies
> --------------------------------------------------------
>
>                 Key: MAHOUT-54
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-54
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.1
>         Environment: OS Independent
>            Reporter: Jeremy Chow
>             Fix For: 0.2
>
>         Attachments: canopykeams.patch
>
>
> The implementation of mahout at present only using canopy algorithm creating initial cluster centroids for k-means.  It will calculate the distance from  each center to every point while iterating. But  the most import improvement of canopies is that needs only calculating the distance from each  center to a much smaller number of points which exists in the same canopy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-54) parallelize k-means sharing the predominance of canopies

Posted by "Jeremy Chow (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-54?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12595974#action_12595974 ] 

Jeremy Chow commented on MAHOUT-54:
-----------------------------------

The reason why I create a new package is that the sequencefile format ouputed by my implementation has some differences from the original one.  
I couldnot syncretize these two swiftly. The key idea of the canopy algorithm is that one can greatly reduce the number of distance computations required for clustering by first cheaply partitioning the data into overlapping subsets, and then only measuring distances among pairs of data points that belong to a common subset. We assume the distance of   two points that never appear in the same canopy to be infinite.  I will add an unit test soon,  but you can try my  verson at first following the steps list below .

1. prepare the data points which we will clustering , and the initial k-means centriods we will take. 
This is a fragment of input data points, the format initial centriods according to is the same with them.
[,125,256,6000,256,16,128,199,]
[,29,8000,32000,32,8,32,253,]
[,29,8000,32000,32,8,32,253,]
[,29,8000,32000,32,8,32,253,]
[,29,8000,16000,32,8,16,132,]
[,26,8000,32000,64,8,32,290,]
[,23,16000,32000,64,16,32,381,]
[,23,16000,32000,64,16,32,381,]
[,23,16000,64000,64,16,32,749,]
[,23,32000,64000,128,32,64,1238,]
[,400,1000,3000,0,1,2,23,]
[,400,512,3500,4,1,6,24,]
[,60,2000,8000,65,1,8,70,]
[,50,4000,16000,65,1,8,117,]
[,350,64,64,0,1,4,15,]
[,200,512,16000,0,4,32,64,]
[,167,524,2000,8,4,15,23,]
[,143,512,5000,0,7,32,29,]
[,143,1000,2000,0,5,16,22,]
[,110,5000,5000,142,8,64,124,]
[,143,1500,6300,0,5,32,35,]

2. create canopies clustering input points, and place the initial k-means centriods into canopies which are the nearest with them.
bin/hadoop jar apache-mahout-*.jar org.apache.mahout.clustering.canopykmeans.CanopyKMeansInitialJob input cluster canopy_output  org.apache.mahout.utils.EuclideanDistanceMeasure 5000 3410
this step will produce three folders of datas : canopies , clusters  and points.


3. clustering with k-means.

bin/hadoop jar apache-mahout-*.jar org.apache.mahout.clustering.canopykmeans.CanopyKMeansDriver canopy_output/points/ canopy_output/clusters/ kmeans_output/  org.apache.mahout.utils.EuclideanDistanceMeasure 0.00001 5

ouput is like these: 
V[, 283.5496183206107, 1248.0610687022902, 5003.480916030534, 12.442748091603054, 2.7022900763358777, 13.083969465648854, 36.52671755725191, ] :C0  [, 900.0, 1000.0, 1000.0, 0.0, 1.0, 2.0, 18.0, ] 
V[, 283.5496183206107, 1248.0610687022902, 5003.480916030534, 12.442748091603054, 2.7022900763358777, 13.083969465648854, 36.52671755725191, ] :C0  [, 900.0, 1000.0, 4000.0, 4.0, 1.0, 2.0, 26.0, ]


I'm sorry that I am very unacquainted about all conventions in ASF, like the printing rules, the tabs format,  the unsuggestive tags, etc. Would you give me some stuff about them? 

> parallelize k-means sharing the predominance of canopies
> --------------------------------------------------------
>
>                 Key: MAHOUT-54
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-54
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.1
>         Environment: OS Independent
>            Reporter: Jeremy Chow
>             Fix For: 0.1
>
>         Attachments: canopykeams.patch
>
>
> The implementation of mahout at present only using canopy algorithm creating initial cluster centroids for k-means.  It will calculate the distance from  each center to every point while iterating. But  the most import improvement of canopies is that needs only calculating the distance from each  center to a much smaller number of points which exists in the same canopy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.