You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by pa...@apache.org on 2014/09/04 16:56:38 UTC
svn commit: r1622492 -
/mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext
Author: pat
Date: Thu Sep 4 14:56:38 2014
New Revision: 1622492
URL: http://svn.apache.org/r1622492
Log:
updating cli help sessage
Modified:
mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext
Modified: mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext
URL: http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext?rev=1622492&r1=1622491&r2=1622492&view=diff
==============================================================================
--- mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext (original)
+++ mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext Thu Sep 4 14:56:38 2014
@@ -21,16 +21,15 @@ cross-cooccurrence is a more principled
to recommend.
- spark-itemsimilarity Mahout 1.0-SNAPSHOT
+ spark-itemsimilarity Mahout 1.0
Usage: spark-itemsimilarity [options]
+ Disconnected from the target VM, address: '127.0.0.1:64676', transport: 'socket'
Input, output options
-i <value> | --input <value>
- Input path, may be a filename, directory name, or comma delimited list of
- HDFS supported URIs (required)
+ Input path, may be a filename, directory name, or comma delimited list of HDFS supported URIs (required)
-i2 <value> | --input2 <value>
- Secondary input path for cross-similarity calculation, same restrictions
- as "--input" (optional). Default: empty.
+ Secondary input path for cross-similarity calculation, same restrictions as "--input" (optional). Default: empty.
-o <value> | --output <value>
Path for output, any local or HDFS supported URI (required)
@@ -38,8 +37,7 @@ to recommend.
-mppu <value> | --maxPrefs <value>
Max number of preferences to consider per user (optional). Default: 500
-m <value> | --maxSimilaritiesPerItem <value>
- Limit the number of similarities per item to this number (optional).
- Default: 100
+ Limit the number of similarities per item to this number (optional). Default: 100
Note: Only the Log Likelihood Ratio (LLR) is supported as a similarity measure.
@@ -47,56 +45,42 @@ to recommend.
-id <value> | --inDelim <value>
Input delimiter character (optional). Default: "[,\t]"
-f1 <value> | --filter1 <value>
- String (or regex) whose presence indicates a datum for the primary item
- set (optional). Default: no filter, all data is used
+ String (or regex) whose presence indicates a datum for the primary item set (optional). Default: no filter, all data is used
-f2 <value> | --filter2 <value>
- String (or regex) whose presence indicates a datum for the secondary item
- set (optional). If not present no secondary dataset is collected
- -rc <value> | --rowIDPosition <value>
- Column number (0 based Int) containing the row ID string (optional).
- Default: 0
- -ic <value> | --itemIDPosition <value>
- Column number (0 based Int) containing the item ID string (optional).
- Default: 1
- -fc <value> | --filterPosition <value>
- Column number (0 based Int) containing the filter string (optional).
- Default: -1 for no filter
+ String (or regex) whose presence indicates a datum for the secondary item set (optional). If not present no secondary dataset is collected
+ -rc <value> | --rowIDColumn <value>
+ Column number (0 based Int) containing the row ID string (optional). Default: 0
+ -ic <value> | --itemIDColumn <value>
+ Column number (0 based Int) containing the item ID string (optional). Default: 1
+ -fc <value> | --filterColumn <value>
+ Column number (0 based Int) containing the filter string (optional). Default: -1 for no filter
Using all defaults the input is expected of the form: "userID<tab>itemId" or "userID<tab>itemID<tab>any-text..." and all rows will be used
File discovery options:
-r | --recursive
- Searched the -i path recursively for files that match --filenamePattern
- (optional), default: false
+ Searched the -i path recursively for files that match --filenamePattern (optional), Default: false
-fp <value> | --filenamePattern <value>
- Regex to match in determining input files (optional). Default: filename
- in the --input option or "^part-.*" if --input is a directory
+ Regex to match in determining input files (optional). Default: filename in the --input option or "^part-.*" if --input is a directory
Output text file schema options:
-rd <value> | --rowKeyDelim <value>
- Separates the rowID key from the vector values list (optional). Default:
- \t"
+ Separates the rowID key from the vector values list (optional). Default: "\t"
-cd <value> | --columnIdStrengthDelim <value>
- Separates column IDs from their values in the vector values list (optional).
- Default: ":"
+ Separates column IDs from their values in the vector values list (optional). Default: ":"
-td <value> | --elementDelim <value>
Separates vector element values in the values list (optional). Default: " "
-os | --omitStrength
Do not write the strength to the output files (optional), Default: false.
- This option is used to output indexable data for creating a search engine
- recommender.
+ This option is used to output indexable data for creating a search engine recommender.
Default delimiters will produce output of the form: "itemID1<tab>itemID2:value2<space>itemID10:value10..."
Spark config options:
-ma <value> | --master <value>
- Spark Master URL (optional). Default: "local". Note that you can specify
- the number of cores to get a performance improvement, for example "local[4]"
+ Spark Master URL (optional). Default: "local". Note that you can specify the number of cores to get a performance improvement, for example "local[4]"
-sem <value> | --sparkExecutorMem <value>
- Max Java heap available as "executor memory" on each node (optional).
- Default: 4g
-
- General config options:
+ Max Java heap available as "executor memory" on each node (optional). Default: 4g
-rs <value> | --randomSeed <value>
-h | --help
@@ -236,61 +220,48 @@ One significant output option is --omitS
The command line interface is:
- spark-rowsimilarity Mahout 1.0-SNAPSHOT
+ spark-rowsimilarity Mahout 1.0
Usage: spark-rowsimilarity [options]
Input, output options
-i <value> | --input <value>
- Input path, may be a filename, directory name, or comma delimited list
- of HDFS supported URIs (required)
- -o <value> | --output <value>
+ Input path, may be a filename, directory name, or comma delimited list of HDFS supported URIs (required)
+ -o <value> | --output <value>
Path for output, any local or HDFS supported URI (required)
Algorithm control options:
-mo <value> | --maxObservations <value>
Max number of observations to consider per row (optional). Default: 500
-m <value> | --maxSimilaritiesPerRow <value>
- Limit the number of similarities per item to this number (optional).
- Default: 100
+ Limit the number of similarities per item to this number (optional). Default: 100
Note: Only the Log Likelihood Ratio (LLR) is supported as a similarity measure.
+ Disconnected from the target VM, address: '127.0.0.1:49162', transport: 'socket'
Output text file schema options:
-rd <value> | --rowKeyDelim <value>
- Separates the rowID key from the vector values list (optional).
- Default: "\t"
+ Separates the rowID key from the vector values list (optional). Default: "\t"
-cd <value> | --columnIdStrengthDelim <value>
- Separates column IDs from their values in the vector values list
- (optional). Default: ":"
+ Separates column IDs from their values in the vector values list (optional). Default: ":"
-td <value> | --elementDelim <value>
- Separates vector element values in the values list (optional).
- Default: " "
+ Separates vector element values in the values list (optional). Default: " "
-os | --omitStrength
- Do not write the strength to the output files (optional), Default:
- false.
- This option is used to output indexable data for creating a search engine
- recommender.
+ Do not write the strength to the output files (optional), Default: false.
+ This option is used to output indexable data for creating a search engine recommender.
Default delimiters will produce output of the form: "itemID1<tab>itemID2:value2<space>itemID10:value10..."
File discovery options:
-r | --recursive
- Searched the -i path recursively for files that match
- --filenamePattern (optional), Default: false
+ Searched the -i path recursively for files that match --filenamePattern (optional), Default: false
-fp <value> | --filenamePattern <value>
- Regex to match in determining input files (optional). Default:
- filename in the --input option or "^part-.*" if --input is a directory
+ Regex to match in determining input files (optional). Default: filename in the --input option or "^part-.*" if --input is a directory
Spark config options:
-ma <value> | --master <value>
- Spark Master URL (optional). Default: "local". Note that you can
- specify the number of cores to get a performance improvement, for
- example "local[4]"
+ Spark Master URL (optional). Default: "local". Note that you can specify the number of cores to get a performance improvement, for example "local[4]"
-sem <value> | --sparkExecutorMem <value>
- Max Java heap available as "executor memory" on each node (optional).
- Default: 4g
-
- General config options:
+ Max Java heap available as "executor memory" on each node (optional). Default: 4g
-rs <value> | --randomSeed <value>
-h | --help