You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by stefanobaghino <gi...@git.apache.org> on 2016/01/21 12:47:45 UTC

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

GitHub user stefanobaghino opened a pull request:

    https://github.com/apache/flink/pull/1536

    [FLINK-2021] Rework examples to use ParameterTool (K-Means)

    I've reworked the K-Means examples in Java and Scala to get a feeling of what it would mean on the whole examples code base. Just a remark: I've caught the `RequiredParametersException` on the requirements application but not on their definition because I've felt the errors are different in nature (the application depends on user input while the definition should be somehow caught sooner — it would be great to have a way to catch it at compile-time).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/radicalbit/flink 2021

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1536.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1536
    
----
commit f68cb95af1c27dd8bf99e1a2374969915dd567f9
Author: Stefano Baghino <st...@baghino.me>
Date:   2016-01-21T11:20:49Z

    [FLINK-2021] Rework the K-Means examples to use built-in argument parsing facilities instead of ad-hoc functions

commit a432cc204741dd0a2d505886b987c4113e781c84
Author: Stefano Baghino <st...@baghino.me>
Date:   2016-01-21T11:33:15Z

    [FLINK-2021] Renamed user-facing names for options

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by chiwanpark <gi...@git.apache.org>.
Github user chiwanpark commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1536#discussion_r50395017
  
    --- Diff: flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java/clustering/KMeans.java ---
    @@ -291,31 +297,54 @@ public Centroid map(Tuple3<Integer, Point, Long> value) {
     	private static String centersPath = null;
     	private static String outputPath = null;
     	private static int numIterations = 10;
    -	
    -	private static boolean parseParameters(String[] programArguments) {
    -		
    -		if(programArguments.length > 0) {
    -			// parse input arguments
    +
    +	private static final Option POINTS_PATH_OPTION =
    +		new Option("points").alt("P").help("The path to the input points");
    +	private static final Option CENTERS_PATH_OPTION =
    +		new Option("centroids").alt("C").help("The path to the input centroids");
    +	private static final Option OUTPUT_PATH_OPTION =
    +		new Option("output").alt("O").help("The path where the output will be written");
    +	private static final Option NUM_ITERATIONS_OPTION =
    +		new Option("iterations").alt("I").help("The number of iteration performed by the K-Means algorithm");
    +
    +	private static boolean parseParameters(final ParameterTool params) throws RequiredParametersException {
    --- End diff --
    
    Does this method throw `RequiredParametersException`? The exception seems caught in `catch` statement.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the pull request:

    https://github.com/apache/flink/pull/1536#issuecomment-174991021
  
    Good point! When the example was added, only Tuples were supported by CsvInputFormats. In the meantime, we Pojo support was added for CsvInputFormats. IMO, it makes sense to use this feature for this example.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the pull request:

    https://github.com/apache/flink/pull/1536#issuecomment-174950409
  
    Sorry for the late notice, but why did we move the code to fetch the data to the top of the main function? 
    IMO, it distracts from the main part of the example code, no?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by stefanobaghino <gi...@git.apache.org>.
Github user stefanobaghino commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1536#discussion_r50399628
  
    --- Diff: flink-examples/flink-examples-batch/src/main/scala/org/apache/flink/examples/scala/clustering/KMeans.scala ---
    @@ -104,36 +105,52 @@ object KMeans {
     
       }
     
    -  private def parseParameters(programArguments: Array[String]): Boolean = {
    -    if (programArguments.length > 0) {
    +  private val POINTS_PATH_OPTION: Option =
    +    new Option("points").alt("P").help("The path to the input points")
    +  private val CENTERS_PATH_OPTION: Option =
    +    new Option("centroids").alt("C").help("The path to the input centroids")
    +  private val OUTPUT_PATH_OPTION: Option =
    +    new Option("output").alt("O").help("The path where the output will be written")
    +  private val NUM_ITERATIONS_OPTION: Option =
    +    new Option("iterations").alt("I").help("The number of iteration performed by the K-Means algorithm")
    +
    +  @throws(classOf[RequiredParametersException])
    +  private def parseParameters(params: ParameterTool): Boolean = {
    +    val requiredParameters: RequiredParameters = new RequiredParameters
    +    var parseStatus: Boolean = false
    +    requiredParameters.add(POINTS_PATH_OPTION)
    +    requiredParameters.add(CENTERS_PATH_OPTION)
    +    requiredParameters.add(OUTPUT_PATH_OPTION)
    +    requiredParameters.add(NUM_ITERATIONS_OPTION)
    +    try {
    +      requiredParameters.applyTo(params)
    +      pointsPath = params.get(POINTS_PATH_OPTION.getName)
    +      centersPath = params.get(CENTERS_PATH_OPTION.getName)
    +      outputPath = params.get(OUTPUT_PATH_OPTION.getName)
    +      numIterations = params.getInt(NUM_ITERATIONS_OPTION.getName)
           fileOutput = true
    -      if (programArguments.length == 4) {
    -        pointsPath = programArguments(0)
    -        centersPath = programArguments(1)
    -        outputPath = programArguments(2)
    -        numIterations = Integer.parseInt(programArguments(3))
    -
    -        true
    -      }
    -      else {
    -        System.err.println("Usage: KMeans <points path> <centers path> <result path> <num " +
    -          "iterations>")
    -
    -        false
    +      parseStatus = true
    +    }
    +    catch {
    +      case e: RequiredParametersException => {
    +        if (params.getNumberOfParameters == 0) {
    +          printRunWithDefaultParams()
    +          parseStatus = true
    +        }
    +        else {
    +          println(requiredParameters.getHelp(e.getMissingArguments))
    +        }
           }
         }
    -    else {
    -      System.out.println("Executing K-Means example with default parameters and built-in default " +
    -        "data.")
    -      System.out.println("  Provide parameters to read input data from files.")
    -      System.out.println("  See the documentation for the correct format of input files.")
    -      System.out.println("  We provide a data generator to create synthetic input files for this " +
    -        "program.")
    -      System.out.println("  Usage: KMeans <points path> <centers path> <result path> <num " +
    -        "iterations>")
    +    return parseStatus
    --- End diff --
    
    Ok, I kept because I tend to do it on "Java-like" code, but I'll fix this as well. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by stefanobaghino <gi...@git.apache.org>.
Github user stefanobaghino commented on the pull request:

    https://github.com/apache/flink/pull/1536#issuecomment-173871987
  
    It seems like your improvements have had a positive feedback from the ML. Shall I proceed replicating those on other examples as well?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the pull request:

    https://github.com/apache/flink/pull/1536#issuecomment-174962013
  
    Thanks @stefanobaghino!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on the pull request:

    https://github.com/apache/flink/pull/1536#issuecomment-174950736
  
    I think its important that users also see how to read data from sources.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the pull request:

    https://github.com/apache/flink/pull/1536#issuecomment-174950907
  
    Right, but this is boilerplate code would still be accessible at the bottom of the file. It is not gone.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by chiwanpark <gi...@git.apache.org>.
Github user chiwanpark commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1536#discussion_r50394801
  
    --- Diff: flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java/clustering/KMeans.java ---
    @@ -291,31 +297,54 @@ public Centroid map(Tuple3<Integer, Point, Long> value) {
     	private static String centersPath = null;
     	private static String outputPath = null;
     	private static int numIterations = 10;
    -	
    -	private static boolean parseParameters(String[] programArguments) {
    -		
    -		if(programArguments.length > 0) {
    -			// parse input arguments
    +
    +	private static final Option POINTS_PATH_OPTION =
    +		new Option("points").alt("P").help("The path to the input points");
    +	private static final Option CENTERS_PATH_OPTION =
    +		new Option("centroids").alt("C").help("The path to the input centroids");
    +	private static final Option OUTPUT_PATH_OPTION =
    +		new Option("output").alt("O").help("The path where the output will be written");
    +	private static final Option NUM_ITERATIONS_OPTION =
    +		new Option("iterations").alt("I").help("The number of iteration performed by the K-Means algorithm");
    +
    +	private static boolean parseParameters(final ParameterTool params) throws RequiredParametersException {
    +
    +		final RequiredParameters requiredParameters = new RequiredParameters();
    +		boolean parseStatus = false;
    +
    +		requiredParameters.add(POINTS_PATH_OPTION);
    +		requiredParameters.add(CENTERS_PATH_OPTION);
    +		requiredParameters.add(OUTPUT_PATH_OPTION);
    +		requiredParameters.add(NUM_ITERATIONS_OPTION);
    +
    +		try {
    +			requiredParameters.applyTo(params);
    +			pointsPath = params.get(POINTS_PATH_OPTION.getName());
    +			centersPath = params.get(CENTERS_PATH_OPTION.getName());
    +			outputPath = params.get(OUTPUT_PATH_OPTION.getName());
    +			numIterations = params.getInt(NUM_ITERATIONS_OPTION.getName());
     			fileOutput = true;
    -			if(programArguments.length == 4) {
    -				pointsPath = programArguments[0];
    -				centersPath = programArguments[1];
    -				outputPath = programArguments[2];
    -				numIterations = Integer.parseInt(programArguments[3]);
    +			parseStatus = true;
    +		} catch (RequiredParametersException e) {
    +			if (params.getNumberOfParameters() == 0) {
    +				printRunWithDefaultParams();
    +				parseStatus = true;
     			} else {
    -				System.err.println("Usage: KMeans <points path> <centers path> <result path> <num iterations>");
    -				return false;
    +				System.out.println(requiredParameters.getHelp(e.getMissingArguments()));
     			}
    -		} else {
    -			System.out.println("Executing K-Means example with default parameters and built-in default data.");
    -			System.out.println("  Provide parameters to read input data from files.");
    -			System.out.println("  See the documentation for the correct format of input files.");
    -			System.out.println("  We provide a data generator to create synthetic input files for this program.");
    -			System.out.println("  Usage: KMeans <points path> <centers path> <result path> <num iterations>");
     		}
    -		return true;
    +
    +		return parseStatus;
     	}
    -	
    +
    +	private static void printRunWithDefaultParams() {
    +		System.out.println("Executing K-Means example with default parameters and built-in default data.");
    +		System.out.println("  Provide parameters to read input data from files.");
    +		System.out.println("  See the documentation for the correct format of input files.");
    +		System.out.println("  We provide a data generator to create synthetic input files for this program.");
    +		System.out.println("  Usage: KMeans <points path> <centers path> <result path> <num iterations>");
    --- End diff --
    
    This parameter description should be updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on the pull request:

    https://github.com/apache/flink/pull/1536#issuecomment-174953115
  
    I don't really see the points of encapsulating parts of the examples into methods: Usually, methods are used when functionality is used in different places. In this case, every method is only called once (there is no generic abstraction). Its adding an unnecessary level of indirection, making it harder to read the code.
    I don't see why reading the initial dataset is less important than the rest of the application. 
    
    Reading the file is just 5 lines of code (relevant code, I woudn't call that boilerplate).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by stefanobaghino <gi...@git.apache.org>.
Github user stefanobaghino commented on the pull request:

    https://github.com/apache/flink/pull/1536#issuecomment-173876414
  
    I'll take some time. I've spotted another possible issue in the docs, I'll look into it as we gain more feedback from the mailing list. Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by uce <gi...@git.apache.org>.
Github user uce commented on the pull request:

    https://github.com/apache/flink/pull/1536#issuecomment-173612372
  
    Nice to see it reworked. Thanks for the contribution. I think Robert raises a good point with the readability improvements though. I would be in favour of Robert's take on the ParameterTool. ;)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by stefanobaghino <gi...@git.apache.org>.
Github user stefanobaghino commented on the pull request:

    https://github.com/apache/flink/pull/1536#issuecomment-177954808
  
    Since it looks like I've gathered enough feedback, I'm closing the PR so that I can keep working on it and not burden on the CI system. Sorry for the WIP PR, I'll reopen it as it's presumably ready for merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by stefanobaghino <gi...@git.apache.org>.
Github user stefanobaghino closed the pull request at:

    https://github.com/apache/flink/pull/1536


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by chiwanpark <gi...@git.apache.org>.
Github user chiwanpark commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1536#discussion_r50395062
  
    --- Diff: flink-examples/flink-examples-batch/src/main/scala/org/apache/flink/examples/scala/clustering/KMeans.scala ---
    @@ -104,36 +105,52 @@ object KMeans {
     
       }
     
    -  private def parseParameters(programArguments: Array[String]): Boolean = {
    -    if (programArguments.length > 0) {
    +  private val POINTS_PATH_OPTION: Option =
    +    new Option("points").alt("P").help("The path to the input points")
    +  private val CENTERS_PATH_OPTION: Option =
    +    new Option("centroids").alt("C").help("The path to the input centroids")
    +  private val OUTPUT_PATH_OPTION: Option =
    +    new Option("output").alt("O").help("The path where the output will be written")
    +  private val NUM_ITERATIONS_OPTION: Option =
    +    new Option("iterations").alt("I").help("The number of iteration performed by the K-Means algorithm")
    +
    +  @throws(classOf[RequiredParametersException])
    --- End diff --
    
    This statement doesn't need also. We catch `RequiredParametersException` in `catch` statement.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by stefanobaghino <gi...@git.apache.org>.
Github user stefanobaghino commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1536#discussion_r50406845
  
    --- Diff: flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java/clustering/KMeans.java ---
    @@ -291,31 +297,54 @@ public Centroid map(Tuple3<Integer, Point, Long> value) {
     	private static String centersPath = null;
     	private static String outputPath = null;
     	private static int numIterations = 10;
    -	
    -	private static boolean parseParameters(String[] programArguments) {
    -		
    -		if(programArguments.length > 0) {
    -			// parse input arguments
    +
    +	private static final Option POINTS_PATH_OPTION =
    +		new Option("points").alt("P").help("The path to the input points");
    +	private static final Option CENTERS_PATH_OPTION =
    +		new Option("centroids").alt("C").help("The path to the input centroids");
    +	private static final Option OUTPUT_PATH_OPTION =
    +		new Option("output").alt("O").help("The path where the output will be written");
    +	private static final Option NUM_ITERATIONS_OPTION =
    +		new Option("iterations").alt("I").help("The number of iteration performed by the K-Means algorithm");
    +
    +	private static boolean parseParameters(final ParameterTool params) throws RequiredParametersException {
    --- End diff --
    
    I thought about it but was afraid of polluting the ´main´ method with "secondary" concerns. I'm going to fix this right now. I'll keep this in mind for other examples as well, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1536#discussion_r50401269
  
    --- Diff: flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java/clustering/KMeans.java ---
    @@ -291,31 +297,54 @@ public Centroid map(Tuple3<Integer, Point, Long> value) {
     	private static String centersPath = null;
     	private static String outputPath = null;
     	private static int numIterations = 10;
    -	
    -	private static boolean parseParameters(String[] programArguments) {
    -		
    -		if(programArguments.length > 0) {
    -			// parse input arguments
    +
    +	private static final Option POINTS_PATH_OPTION =
    +		new Option("points").alt("P").help("The path to the input points");
    +	private static final Option CENTERS_PATH_OPTION =
    +		new Option("centroids").alt("C").help("The path to the input centroids");
    +	private static final Option OUTPUT_PATH_OPTION =
    +		new Option("output").alt("O").help("The path where the output will be written");
    +	private static final Option NUM_ITERATIONS_OPTION =
    +		new Option("iterations").alt("I").help("The number of iteration performed by the K-Means algorithm");
    +
    +	private static boolean parseParameters(final ParameterTool params) throws RequiredParametersException {
    --- End diff --
    
    I would actually remove the `parseParameters` method


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by chiwanpark <gi...@git.apache.org>.
Github user chiwanpark commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1536#discussion_r50395674
  
    --- Diff: flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java/clustering/KMeans.java ---
    @@ -291,31 +297,54 @@ public Centroid map(Tuple3<Integer, Point, Long> value) {
     	private static String centersPath = null;
     	private static String outputPath = null;
     	private static int numIterations = 10;
    -	
    -	private static boolean parseParameters(String[] programArguments) {
    -		
    -		if(programArguments.length > 0) {
    -			// parse input arguments
    +
    +	private static final Option POINTS_PATH_OPTION =
    +		new Option("points").alt("P").help("The path to the input points");
    +	private static final Option CENTERS_PATH_OPTION =
    +		new Option("centroids").alt("C").help("The path to the input centroids");
    +	private static final Option OUTPUT_PATH_OPTION =
    +		new Option("output").alt("O").help("The path where the output will be written");
    +	private static final Option NUM_ITERATIONS_OPTION =
    +		new Option("iterations").alt("I").help("The number of iteration performed by the K-Means algorithm");
    +
    +	private static boolean parseParameters(final ParameterTool params) throws RequiredParametersException {
    --- End diff --
    
    Oh, `add` method in `RequiredParameters` can throw `RequireParametersException`. Ignore my previous comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by stefanobaghino <gi...@git.apache.org>.
Github user stefanobaghino commented on the pull request:

    https://github.com/apache/flink/pull/1536#issuecomment-174951427
  
    Maybe I can rework the example so that secondary needs (like parameter parsing and source reading) are encapsulated in functions kept at the bottom, while eventually adding one further example with a spotlights on these secondary (but important) tasks. What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1536#discussion_r50401470
  
    --- Diff: flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java/clustering/KMeans.java ---
    @@ -291,31 +297,54 @@ public Centroid map(Tuple3<Integer, Point, Long> value) {
     	private static String centersPath = null;
     	private static String outputPath = null;
     	private static int numIterations = 10;
    -	
    -	private static boolean parseParameters(String[] programArguments) {
    -		
    -		if(programArguments.length > 0) {
    -			// parse input arguments
    +
    +	private static final Option POINTS_PATH_OPTION =
    +		new Option("points").alt("P").help("The path to the input points");
    +	private static final Option CENTERS_PATH_OPTION =
    +		new Option("centroids").alt("C").help("The path to the input centroids");
    +	private static final Option OUTPUT_PATH_OPTION =
    +		new Option("output").alt("O").help("The path where the output will be written");
    +	private static final Option NUM_ITERATIONS_OPTION =
    +		new Option("iterations").alt("I").help("The number of iteration performed by the K-Means algorithm");
    +
    +	private static boolean parseParameters(final ParameterTool params) throws RequiredParametersException {
    +
    +		final RequiredParameters requiredParameters = new RequiredParameters();
    +		boolean parseStatus = false;
    +
    +		requiredParameters.add(POINTS_PATH_OPTION);
    +		requiredParameters.add(CENTERS_PATH_OPTION);
    +		requiredParameters.add(OUTPUT_PATH_OPTION);
    +		requiredParameters.add(NUM_ITERATIONS_OPTION);
    +
    +		try {
    +			requiredParameters.applyTo(params);
    +			pointsPath = params.get(POINTS_PATH_OPTION.getName());
    +			centersPath = params.get(CENTERS_PATH_OPTION.getName());
    +			outputPath = params.get(OUTPUT_PATH_OPTION.getName());
    +			numIterations = params.getInt(NUM_ITERATIONS_OPTION.getName());
     			fileOutput = true;
    -			if(programArguments.length == 4) {
    -				pointsPath = programArguments[0];
    -				centersPath = programArguments[1];
    -				outputPath = programArguments[2];
    -				numIterations = Integer.parseInt(programArguments[3]);
    +			parseStatus = true;
    +		} catch (RequiredParametersException e) {
    +			if (params.getNumberOfParameters() == 0) {
    +				printRunWithDefaultParams();
    +				parseStatus = true;
     			} else {
    -				System.err.println("Usage: KMeans <points path> <centers path> <result path> <num iterations>");
    -				return false;
    +				System.out.println(requiredParameters.getHelp(e.getMissingArguments()));
     			}
    -		} else {
    -			System.out.println("Executing K-Means example with default parameters and built-in default data.");
    -			System.out.println("  Provide parameters to read input data from files.");
    -			System.out.println("  See the documentation for the correct format of input files.");
    -			System.out.println("  We provide a data generator to create synthetic input files for this program.");
    -			System.out.println("  Usage: KMeans <points path> <centers path> <result path> <num iterations>");
     		}
    -		return true;
    +
    +		return parseStatus;
     	}
    -	
    +
    +	private static void printRunWithDefaultParams() {
    +		System.out.println("Executing K-Means example with default parameters and built-in default data.");
    +		System.out.println("  Provide parameters to read input data from files.");
    +		System.out.println("  See the documentation for the correct format of input files.");
    +		System.out.println("  We provide a data generator to create synthetic input files for this program.");
    +		System.out.println("  Usage: KMeans --points <points path> --centroids <centers path> --output <result path> --iterations <num iterations>");
    --- End diff --
    
    I think you can generate the usage information from the required parameters


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by stefanobaghino <gi...@git.apache.org>.
Github user stefanobaghino commented on the pull request:

    https://github.com/apache/flink/pull/1536#issuecomment-179225452
  
    Branch recreated to remove merge commits, now opened as #1581.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by stefanobaghino <gi...@git.apache.org>.
Github user stefanobaghino commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1536#discussion_r50399418
  
    --- Diff: flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java/clustering/KMeans.java ---
    @@ -291,31 +297,54 @@ public Centroid map(Tuple3<Integer, Point, Long> value) {
     	private static String centersPath = null;
     	private static String outputPath = null;
     	private static int numIterations = 10;
    -	
    -	private static boolean parseParameters(String[] programArguments) {
    -		
    -		if(programArguments.length > 0) {
    -			// parse input arguments
    +
    +	private static final Option POINTS_PATH_OPTION =
    +		new Option("points").alt("P").help("The path to the input points");
    +	private static final Option CENTERS_PATH_OPTION =
    +		new Option("centroids").alt("C").help("The path to the input centroids");
    +	private static final Option OUTPUT_PATH_OPTION =
    +		new Option("output").alt("O").help("The path where the output will be written");
    +	private static final Option NUM_ITERATIONS_OPTION =
    +		new Option("iterations").alt("I").help("The number of iteration performed by the K-Means algorithm");
    +
    +	private static boolean parseParameters(final ParameterTool params) throws RequiredParametersException {
    +
    +		final RequiredParameters requiredParameters = new RequiredParameters();
    +		boolean parseStatus = false;
    +
    +		requiredParameters.add(POINTS_PATH_OPTION);
    +		requiredParameters.add(CENTERS_PATH_OPTION);
    +		requiredParameters.add(OUTPUT_PATH_OPTION);
    +		requiredParameters.add(NUM_ITERATIONS_OPTION);
    +
    +		try {
    +			requiredParameters.applyTo(params);
    +			pointsPath = params.get(POINTS_PATH_OPTION.getName());
    +			centersPath = params.get(CENTERS_PATH_OPTION.getName());
    +			outputPath = params.get(OUTPUT_PATH_OPTION.getName());
    +			numIterations = params.getInt(NUM_ITERATIONS_OPTION.getName());
     			fileOutput = true;
    -			if(programArguments.length == 4) {
    -				pointsPath = programArguments[0];
    -				centersPath = programArguments[1];
    -				outputPath = programArguments[2];
    -				numIterations = Integer.parseInt(programArguments[3]);
    +			parseStatus = true;
    +		} catch (RequiredParametersException e) {
    +			if (params.getNumberOfParameters() == 0) {
    +				printRunWithDefaultParams();
    +				parseStatus = true;
     			} else {
    -				System.err.println("Usage: KMeans <points path> <centers path> <result path> <num iterations>");
    -				return false;
    +				System.out.println(requiredParameters.getHelp(e.getMissingArguments()));
     			}
    -		} else {
    -			System.out.println("Executing K-Means example with default parameters and built-in default data.");
    -			System.out.println("  Provide parameters to read input data from files.");
    -			System.out.println("  See the documentation for the correct format of input files.");
    -			System.out.println("  We provide a data generator to create synthetic input files for this program.");
    -			System.out.println("  Usage: KMeans <points path> <centers path> <result path> <num iterations>");
     		}
    -		return true;
    +
    +		return parseStatus;
     	}
    -	
    +
    +	private static void printRunWithDefaultParams() {
    +		System.out.println("Executing K-Means example with default parameters and built-in default data.");
    +		System.out.println("  Provide parameters to read input data from files.");
    +		System.out.println("  See the documentation for the correct format of input files.");
    +		System.out.println("  We provide a data generator to create synthetic input files for this program.");
    +		System.out.println("  Usage: KMeans <points path> <centers path> <result path> <num iterations>");
    --- End diff --
    
    Thanks for making me notice it, I'll fix it right away!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by stefanobaghino <gi...@git.apache.org>.
Github user stefanobaghino commented on the pull request:

    https://github.com/apache/flink/pull/1536#issuecomment-174966751
  
    This can be a starting point for further discussion. The only weak point I see (but maybe I'm wrong) is that right now we're reading `Tuple`s and then converting them to `Point`s and `Centroid`s. Since one of the stated aims of the example is to showcase the usage of POJOs, should I also use the `pojoType` method on input sources and drop the `TuplePointConverter` and `TupleCentroidConverter`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the pull request:

    https://github.com/apache/flink/pull/1536#issuecomment-176675459
  
    Thanks for the update! 
    +1 using case classes in the Scala example. Can you also check if we can directly read the CSV file into the case classes (not sure if that works).
    
    Looks good otherwise.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by chiwanpark <gi...@git.apache.org>.
Github user chiwanpark commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1536#discussion_r50498429
  
    --- Diff: flink-examples/flink-examples-batch/src/main/scala/org/apache/flink/examples/scala/clustering/KMeans.scala ---
    @@ -26,53 +27,84 @@ import org.apache.flink.examples.java.clustering.util.KMeansData
     import scala.collection.JavaConverters._
     
     /**
    - * This example implements a basic K-Means clustering algorithm.
    - *
    - * K-Means is an iterative clustering algorithm and works as follows:
    - * K-Means is given a set of data points to be clustered and an initial set of ''K'' cluster
    - * centers.
    - * In each iteration, the algorithm computes the distance of each data point to each cluster center.
    - * Each point is assigned to the cluster center which is closest to it.
    - * Subsequently, each cluster center is moved to the center (''mean'') of all points that have
    - * been assigned to it.
    - * The moved cluster centers are fed into the next iteration. 
    - * The algorithm terminates after a fixed number of iterations (as in this implementation) 
    - * or if cluster centers do not (significantly) move in an iteration.
    - * This is the Wikipedia entry for the [[http://en.wikipedia
    - * .org/wiki/K-means_clustering K-Means Clustering algorithm]].
    - *
    - * This implementation works on two-dimensional data points.
    - * It computes an assignment of data points to cluster centers, i.e., 
    - * each data point is annotated with the id of the final cluster (center) it belongs to.
    - *
    - * Input files are plain text files and must be formatted as follows:
    - *
    - *  - Data points are represented as two double values separated by a blank character.
    - *    Data points are separated by newline characters.
    - *    For example `"1.2 2.3\n5.3 7.2\n"` gives two data points (x=1.2, y=2.3) and (x=5.3,
    - *    y=7.2).
    - *  - Cluster centers are represented by an integer id and a point value.
    - *    For example `"1 6.2 3.2\n2 2.9 5.7\n"` gives two centers (id=1, x=6.2,
    - *    y=3.2) and (id=2, x=2.9, y=5.7).
    - *
    - * Usage:
    - * {{{
    - *   KMeans <points path> <centers path> <result path> <num iterations>
    - * }}}
    - * If no parameters are provided, the program is run with default data from
    - * [[org.apache.flink.examples.java.clustering.util.KMeansData]]
    - * and 10 iterations.
    - *
    - * This example shows how to use:
    - *
    - *  - Bulk iterations
    - *  - Broadcast variables in bulk iterations
    - *  - Custom Java objects (PoJos)
    - */
    +  * This example implements a basic K-Means clustering algorithm.
    +  *
    +  * K-Means is an iterative clustering algorithm and works as follows:
    +  * K-Means is given a set of data points to be clustered and an initial set of ''K'' cluster
    +  * centers.
    +  * In each iteration, the algorithm computes the distance of each data point to each cluster center.
    +  * Each point is assigned to the cluster center which is closest to it.
    +  * Subsequently, each cluster center is moved to the center (''mean'') of all points that have
    +  * been assigned to it.
    +  * The moved cluster centers are fed into the next iteration.
    +  * The algorithm terminates after a fixed number of iterations (as in this implementation)
    +  * or if cluster centers do not (significantly) move in an iteration.
    +  * This is the Wikipedia entry for the [[http://en.wikipedia
    +  * .org/wiki/K-means_clustering K-Means Clustering algorithm]].
    +  *
    +  * This implementation works on two-dimensional data points.
    +  * It computes an assignment of data points to cluster centers, i.e.,
    +  * each data point is annotated with the id of the final cluster (center) it belongs to.
    +  *
    +  * Input files are plain text files and must be formatted as follows:
    +  *
    +  * - Data points are represented as two double values separated by a blank character.
    +  * Data points are separated by newline characters.
    +  * For example `"1.2 2.3\n5.3 7.2\n"` gives two data points (x=1.2, y=2.3) and (x=5.3,
    +  * y=7.2).
    +  * - Cluster centers are represented by an integer id and a point value.
    +  * For example `"1 6.2 3.2\n2 2.9 5.7\n"` gives two centers (id=1, x=6.2,
    +  * y=3.2) and (id=2, x=2.9, y=5.7).
    +  *
    +  * Usage:
    +  * {{{
    +  *   KMeans <points path> <centers path> <result path> <num iterations>
    +  * }}}
    +  * If no parameters are provided, the program is run with default data from
    +  * [[org.apache.flink.examples.java.clustering.util.KMeansData]]
    +  * and 10 iterations.
    +  *
    +  * This example shows how to use:
    +  *
    +  * - Bulk iterations
    +  * - Broadcast variables in bulk iterations
    +  * - Custom Java objects (PoJos)
    --- End diff --
    
    We're using "Scala objects". Could you change this line?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on the pull request:

    https://github.com/apache/flink/pull/1536#issuecomment-174946246
  
    Okay, since nobody objected on the mailing list, we can start changing all examples.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by stefanobaghino <gi...@git.apache.org>.
Github user stefanobaghino commented on the pull request:

    https://github.com/apache/flink/pull/1536#issuecomment-176700143
  
    It works, I'll be pushing the code in a few minutes. If everything looks good I can move on to the other examples. Thank you all the precious feedback provided so far.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by chiwanpark <gi...@git.apache.org>.
Github user chiwanpark commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1536#discussion_r50394812
  
    --- Diff: flink-examples/flink-examples-batch/src/main/scala/org/apache/flink/examples/scala/clustering/KMeans.scala ---
    @@ -104,36 +105,52 @@ object KMeans {
     
       }
     
    -  private def parseParameters(programArguments: Array[String]): Boolean = {
    -    if (programArguments.length > 0) {
    +  private val POINTS_PATH_OPTION: Option =
    +    new Option("points").alt("P").help("The path to the input points")
    +  private val CENTERS_PATH_OPTION: Option =
    +    new Option("centroids").alt("C").help("The path to the input centroids")
    +  private val OUTPUT_PATH_OPTION: Option =
    +    new Option("output").alt("O").help("The path where the output will be written")
    +  private val NUM_ITERATIONS_OPTION: Option =
    +    new Option("iterations").alt("I").help("The number of iteration performed by the K-Means algorithm")
    +
    +  @throws(classOf[RequiredParametersException])
    +  private def parseParameters(params: ParameterTool): Boolean = {
    +    val requiredParameters: RequiredParameters = new RequiredParameters
    +    var parseStatus: Boolean = false
    +    requiredParameters.add(POINTS_PATH_OPTION)
    +    requiredParameters.add(CENTERS_PATH_OPTION)
    +    requiredParameters.add(OUTPUT_PATH_OPTION)
    +    requiredParameters.add(NUM_ITERATIONS_OPTION)
    +    try {
    +      requiredParameters.applyTo(params)
    +      pointsPath = params.get(POINTS_PATH_OPTION.getName)
    +      centersPath = params.get(CENTERS_PATH_OPTION.getName)
    +      outputPath = params.get(OUTPUT_PATH_OPTION.getName)
    +      numIterations = params.getInt(NUM_ITERATIONS_OPTION.getName)
           fileOutput = true
    -      if (programArguments.length == 4) {
    -        pointsPath = programArguments(0)
    -        centersPath = programArguments(1)
    -        outputPath = programArguments(2)
    -        numIterations = Integer.parseInt(programArguments(3))
    -
    -        true
    -      }
    -      else {
    -        System.err.println("Usage: KMeans <points path> <centers path> <result path> <num " +
    -          "iterations>")
    -
    -        false
    +      parseStatus = true
    +    }
    +    catch {
    +      case e: RequiredParametersException => {
    +        if (params.getNumberOfParameters == 0) {
    +          printRunWithDefaultParams()
    +          parseStatus = true
    +        }
    +        else {
    +          println(requiredParameters.getHelp(e.getMissingArguments))
    +        }
           }
         }
    -    else {
    -      System.out.println("Executing K-Means example with default parameters and built-in default " +
    -        "data.")
    -      System.out.println("  Provide parameters to read input data from files.")
    -      System.out.println("  See the documentation for the correct format of input files.")
    -      System.out.println("  We provide a data generator to create synthetic input files for this " +
    -        "program.")
    -      System.out.println("  Usage: KMeans <points path> <centers path> <result path> <num " +
    -        "iterations>")
    +    return parseStatus
    +  }
     
    -      true
    -    }
    +  private def printRunWithDefaultParams() {
    +    println("Executing K-Means example with default parameters and built-in default data.")
    +    println("  Provide parameters to read input data from files.")
    +    println("  See the documentation for the correct format of input files.")
    +    println("  We provide a data generator to create synthetic input files for this program.")
    +    println("  Usage: KMeans <points path> <centers path> <result path> <num iterations>")
    --- End diff --
    
    This parameter description should be updated also.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by stefanobaghino <gi...@git.apache.org>.
Github user stefanobaghino commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1536#discussion_r50402286
  
    --- Diff: flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java/clustering/KMeans.java ---
    @@ -291,31 +297,54 @@ public Centroid map(Tuple3<Integer, Point, Long> value) {
     	private static String centersPath = null;
     	private static String outputPath = null;
     	private static int numIterations = 10;
    -	
    -	private static boolean parseParameters(String[] programArguments) {
    -		
    -		if(programArguments.length > 0) {
    -			// parse input arguments
    +
    +	private static final Option POINTS_PATH_OPTION =
    +		new Option("points").alt("P").help("The path to the input points");
    +	private static final Option CENTERS_PATH_OPTION =
    +		new Option("centroids").alt("C").help("The path to the input centroids");
    +	private static final Option OUTPUT_PATH_OPTION =
    +		new Option("output").alt("O").help("The path where the output will be written");
    +	private static final Option NUM_ITERATIONS_OPTION =
    +		new Option("iterations").alt("I").help("The number of iteration performed by the K-Means algorithm");
    +
    +	private static boolean parseParameters(final ParameterTool params) throws RequiredParametersException {
    +
    +		final RequiredParameters requiredParameters = new RequiredParameters();
    +		boolean parseStatus = false;
    +
    +		requiredParameters.add(POINTS_PATH_OPTION);
    +		requiredParameters.add(CENTERS_PATH_OPTION);
    +		requiredParameters.add(OUTPUT_PATH_OPTION);
    +		requiredParameters.add(NUM_ITERATIONS_OPTION);
    +
    +		try {
    +			requiredParameters.applyTo(params);
    +			pointsPath = params.get(POINTS_PATH_OPTION.getName());
    +			centersPath = params.get(CENTERS_PATH_OPTION.getName());
    +			outputPath = params.get(OUTPUT_PATH_OPTION.getName());
    +			numIterations = params.getInt(NUM_ITERATIONS_OPTION.getName());
     			fileOutput = true;
    -			if(programArguments.length == 4) {
    -				pointsPath = programArguments[0];
    -				centersPath = programArguments[1];
    -				outputPath = programArguments[2];
    -				numIterations = Integer.parseInt(programArguments[3]);
    +			parseStatus = true;
    +		} catch (RequiredParametersException e) {
    +			if (params.getNumberOfParameters() == 0) {
    +				printRunWithDefaultParams();
    +				parseStatus = true;
     			} else {
    -				System.err.println("Usage: KMeans <points path> <centers path> <result path> <num iterations>");
    -				return false;
    +				System.out.println(requiredParameters.getHelp(e.getMissingArguments()));
     			}
    -		} else {
    -			System.out.println("Executing K-Means example with default parameters and built-in default data.");
    -			System.out.println("  Provide parameters to read input data from files.");
    -			System.out.println("  See the documentation for the correct format of input files.");
    -			System.out.println("  We provide a data generator to create synthetic input files for this program.");
    -			System.out.println("  Usage: KMeans <points path> <centers path> <result path> <num iterations>");
     		}
    -		return true;
    +
    +		return parseStatus;
     	}
    -	
    +
    +	private static void printRunWithDefaultParams() {
    +		System.out.println("Executing K-Means example with default parameters and built-in default data.");
    +		System.out.println("  Provide parameters to read input data from files.");
    +		System.out.println("  See the documentation for the correct format of input files.");
    +		System.out.println("  We provide a data generator to create synthetic input files for this program.");
    +		System.out.println("  Usage: KMeans --points <points path> --centroids <centers path> --output <result path> --iterations <num iterations>");
    --- End diff --
    
    Clever! I'll do it, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by stefanobaghino <gi...@git.apache.org>.
Github user stefanobaghino commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1536#discussion_r50400122
  
    --- Diff: flink-examples/flink-examples-batch/src/main/scala/org/apache/flink/examples/scala/clustering/KMeans.scala ---
    @@ -104,36 +105,52 @@ object KMeans {
     
       }
     
    -  private def parseParameters(programArguments: Array[String]): Boolean = {
    -    if (programArguments.length > 0) {
    +  private val POINTS_PATH_OPTION: Option =
    +    new Option("points").alt("P").help("The path to the input points")
    +  private val CENTERS_PATH_OPTION: Option =
    +    new Option("centroids").alt("C").help("The path to the input centroids")
    +  private val OUTPUT_PATH_OPTION: Option =
    +    new Option("output").alt("O").help("The path where the output will be written")
    +  private val NUM_ITERATIONS_OPTION: Option =
    +    new Option("iterations").alt("I").help("The number of iteration performed by the K-Means algorithm")
    +
    +  @throws(classOf[RequiredParametersException])
    --- End diff --
    
    Should I ignore this remark, too? You can find a more detailed explanation of why I kept the checked exceptions for the `add` calls in the PR description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by stefanobaghino <gi...@git.apache.org>.
Github user stefanobaghino commented on the pull request:

    https://github.com/apache/flink/pull/1536#issuecomment-174950384
  
    Good, I'll keep on working on the PR with the other examples, thank you for the feedback and for the guidance provided so far. :smiley: 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by stefanobaghino <gi...@git.apache.org>.
Github user stefanobaghino commented on the pull request:

    https://github.com/apache/flink/pull/1536#issuecomment-174960226
  
    I'll edit the code so that we can all read the result and maybe discuss about the pros and cons of this approach. Again, thank you both for the feedback.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by chiwanpark <gi...@git.apache.org>.
Github user chiwanpark commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1536#discussion_r50394888
  
    --- Diff: flink-examples/flink-examples-batch/src/main/scala/org/apache/flink/examples/scala/clustering/KMeans.scala ---
    @@ -104,36 +105,52 @@ object KMeans {
     
       }
     
    -  private def parseParameters(programArguments: Array[String]): Boolean = {
    -    if (programArguments.length > 0) {
    +  private val POINTS_PATH_OPTION: Option =
    +    new Option("points").alt("P").help("The path to the input points")
    +  private val CENTERS_PATH_OPTION: Option =
    +    new Option("centroids").alt("C").help("The path to the input centroids")
    +  private val OUTPUT_PATH_OPTION: Option =
    +    new Option("output").alt("O").help("The path where the output will be written")
    +  private val NUM_ITERATIONS_OPTION: Option =
    +    new Option("iterations").alt("I").help("The number of iteration performed by the K-Means algorithm")
    +
    +  @throws(classOf[RequiredParametersException])
    +  private def parseParameters(params: ParameterTool): Boolean = {
    +    val requiredParameters: RequiredParameters = new RequiredParameters
    +    var parseStatus: Boolean = false
    +    requiredParameters.add(POINTS_PATH_OPTION)
    +    requiredParameters.add(CENTERS_PATH_OPTION)
    +    requiredParameters.add(OUTPUT_PATH_OPTION)
    +    requiredParameters.add(NUM_ITERATIONS_OPTION)
    +    try {
    +      requiredParameters.applyTo(params)
    +      pointsPath = params.get(POINTS_PATH_OPTION.getName)
    +      centersPath = params.get(CENTERS_PATH_OPTION.getName)
    +      outputPath = params.get(OUTPUT_PATH_OPTION.getName)
    +      numIterations = params.getInt(NUM_ITERATIONS_OPTION.getName)
           fileOutput = true
    -      if (programArguments.length == 4) {
    -        pointsPath = programArguments(0)
    -        centersPath = programArguments(1)
    -        outputPath = programArguments(2)
    -        numIterations = Integer.parseInt(programArguments(3))
    -
    -        true
    -      }
    -      else {
    -        System.err.println("Usage: KMeans <points path> <centers path> <result path> <num " +
    -          "iterations>")
    -
    -        false
    +      parseStatus = true
    +    }
    +    catch {
    +      case e: RequiredParametersException => {
    +        if (params.getNumberOfParameters == 0) {
    +          printRunWithDefaultParams()
    +          parseStatus = true
    +        }
    +        else {
    +          println(requiredParameters.getHelp(e.getMissingArguments))
    +        }
           }
         }
    -    else {
    -      System.out.println("Executing K-Means example with default parameters and built-in default " +
    -        "data.")
    -      System.out.println("  Provide parameters to read input data from files.")
    -      System.out.println("  See the documentation for the correct format of input files.")
    -      System.out.println("  We provide a data generator to create synthetic input files for this " +
    -        "program.")
    -      System.out.println("  Usage: KMeans <points path> <centers path> <result path> <num " +
    -        "iterations>")
    +    return parseStatus
    --- End diff --
    
    In Scala, we don't need `return` keyword. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on the pull request:

    https://github.com/apache/flink/pull/1536#issuecomment-173607167
  
    I have quickly checked out your branch and came up with another variant for reworking the KMeans example with the `ParameterTool`: https://github.com/rmetzger/flink/blob/4c3b569b522172043fe1f49da2858bae37519eef/flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java/clustering/KMeans.java
    
    I didn't use the required parameters, because they make the code too clumsy.
    I also think that getting rid of all these static variables and methods is a great improvement in readability. Users new to Flink should be able to focus on understanding the Flink's APIs and abstractions and not be distracted by our examples.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on the pull request:

    https://github.com/apache/flink/pull/1536#issuecomment-173874750
  
    mh, I understand that you are motivated to move on with this task. I think most likely other Flink committers agree to the change, but it might also happen that the discussion is not over yet.
    Worst case: You'll have to redo some of the changes because the discussion goes into a different direction.
    If you are confident enough and you are aware of the risks, you can also do it ... its up to you :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2021] Rework examples to use ParameterT...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the pull request:

    https://github.com/apache/flink/pull/1536#issuecomment-174959098
  
    The examples handle inputs differently than most other DataSet programs because they use default input data if no input paths are specified. This adds complexity which is not present in common programs. 
    
    I agree that reading input is an important aspect of a DataSet program, but adding 15 lines for each input to each example seems unnecessary to me.
    
    Extracting code into methods can be good practice, even if a method is only called once. Reading code can become a lot easier if it is clear what a method does (good method name + comments) because it can hide all little nasty details and helps to focus on the relevant parts. 
    The only question is how do we define relevance ;-)
    
    @stefanobaghino, I think this is a good idea. I would keep the parameter parsing in the main function of the examples, but only move the code to either read a CSV file or get a collection DataSource to a method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---