You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Chris Harrington <ch...@heystaks.com> on 2013/03/12 13:37:47 UTC

ClusterDumper writing to local instead of HDFS

Hi all,

The subject line says it all, ClusterDumper is writing to local file system instead of HDFS.

After looking at the source

From the ClusterDumper class

if (this.outputFile == null) {
      shouldClose = false;
      writer = new OutputStreamWriter(System.out);
    } else {
      shouldClose = true;
      if (outputFile.getName().startsWith("s3n://")) {
        Path p = outputPath;
        FileSystem fs = FileSystem.get(p.toUri(), conf);
        writer = new OutputStreamWriter(fs.create(p), Charsets.UTF_8);
      } else {
        writer = Files.newWriter(this.outputFile, Charsets.UTF_8);
      }
    }


From the Files class

  public static BufferedWriter newWriter(File file, Charset charset)
      throws FileNotFoundException {
   return new BufferedWriter(
        new OutputStreamWriter(new FileOutputStream(file), charset));
  }


So a few questions on the above.

1. Am I correct in saying if  the outputFile starts with "s3n://" it writes to the HDFS other wise it writes to the local FS?

2. If the above is true then what is the meaning  of a URI starting with s3n:// 

3. Is there a way to force it to write to the HDFS even if the URI doesn't start with s3n:// or am I going to have to modify ClusterDumper class myself?

Re: ClusterDumper writing to local instead of HDFS

Posted by Chris Harrington <ch...@heystaks.com>.

Thanks Dan,

I went ahead and changed the ClusterDumper class. It's not a pretty solution since I just copied the ClusterDumper java code into my own java file and added a new option 

private boolean writeToHadoop;

addOption(WRITE_TO_HADOOP,"wh","Try write to HDFS","false");

if(hasOption(WRITE_TO_HADOOP)){

	writeToHadoop = true;
}else{
	writeToHadoop = false;
}

and then added an OR to the if

if (outputFile.getName().startsWith("s3n://") || writeToHadoop) 

so now I can specify to write to hadoop with a '-wh' arg.

On 12 Mar 2013, at 12:37, Chris Harrington wrote:

> Hi all,
> 
> The subject line says it all, ClusterDumper is writing to local file system instead of HDFS.
> 
> After looking at the source
> 
> From the ClusterDumper class
> 
> if (this.outputFile == null) {
>      shouldClose = false;
>      writer = new OutputStreamWriter(System.out);
>    } else {
>      shouldClose = true;
>      if (outputFile.getName().startsWith("s3n://")) {
>        Path p = outputPath;
>        FileSystem fs = FileSystem.get(p.toUri(), conf);
>        writer = new OutputStreamWriter(fs.create(p), Charsets.UTF_8);
>      } else {
>        writer = Files.newWriter(this.outputFile, Charsets.UTF_8);
>      }
>    }
> 
> 
> From the Files class
> 
>  public static BufferedWriter newWriter(File file, Charset charset)
>      throws FileNotFoundException {
>   return new BufferedWriter(
>        new OutputStreamWriter(new FileOutputStream(file), charset));
>  }
> 
> 
> So a few questions on the above.
> 
> 1. Am I correct in saying if  the outputFile starts with "s3n://" it writes to the HDFS other wise it writes to the local FS?
> 
> 2. If the above is true then what is the meaning  of a URI starting with s3n:// 
> 
> 3. Is there a way to force it to write to the HDFS even if the URI doesn't start with s3n:// or am I going to have to modify ClusterDumper class myself?
> 
> 
>

Re: ClusterDumper writing to local instead of HDFS

Posted by Dan Filimon <da...@gmail.com>.

1. s3n is actually the URI for the Amazon S3 filesystem [1]. Normally
HDFS URIs start with "hdfs://" and local URIs start with "file://".
There is a default configured in your local Hadoop setup
(fs.default.name). This [2] seems like a useful link.

2. See 1. :)

3. It looks like in the default case (the last else) is just uses
whatever your default filesystem is. Chances are that it's file in
your case. Setting the URI to hdfs://localhost:9000 (generically
host:port and the port might be different on your machine) should fix
it.

Good luck!

[1] http://wiki.apache.org/hadoop/AmazonS3
[2] http://www.greenplum.com/blog/dive-in/usage-and-quirks-of-fs-default-name-in-hadoop-filesystem

On Tue, Mar 12, 2013 at 2:37 PM, Chris Harrington <ch...@heystaks.com> wrote:
> Hi all,
>
> The subject line says it all, ClusterDumper is writing to local file system instead of HDFS.
>
> After looking at the source
>
> From the ClusterDumper class
>
> if (this.outputFile == null) {
>       shouldClose = false;
>       writer = new OutputStreamWriter(System.out);
>     } else {
>       shouldClose = true;
>       if (outputFile.getName().startsWith("s3n://")) {
>         Path p = outputPath;
>         FileSystem fs = FileSystem.get(p.toUri(), conf);
>         writer = new OutputStreamWriter(fs.create(p), Charsets.UTF_8);
>       } else {
>         writer = Files.newWriter(this.outputFile, Charsets.UTF_8);
>       }
>     }
>
>
> From the Files class
>
>   public static BufferedWriter newWriter(File file, Charset charset)
>       throws FileNotFoundException {
>    return new BufferedWriter(
>         new OutputStreamWriter(new FileOutputStream(file), charset));
>   }
>
>
> So a few questions on the above.
>
> 1. Am I correct in saying if  the outputFile starts with "s3n://" it writes to the HDFS other wise it writes to the local FS?
>
> 2. If the above is true then what is the meaning  of a URI starting with s3n://
>
> 3. Is there a way to force it to write to the HDFS even if the URI doesn't start with s3n:// or am I going to have to modify ClusterDumper class myself?
>
>
>