You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Yang <te...@gmail.com> on 2013/09/03 22:40:10 UTC

how can I let camus etl point to hdfs instead of local?

I tried to run camus with the properties file in the examples dir:

java -cp ...... com.linkedin.camus.etl.kafka.CamusJob -P
myproperties.properties


then it says that my output dir does not exist:
~/tools/camus/camus-etl-kafka$ java -cp
target/camus-etl-kafka-0.1.0-SNAPSHOT.jar
com.linkedin.camus.etl.kafka.CamusJob -P camus.properties
Starting Kafka ETL Job
The blacklisted topics: []
The whitelisted topics: []
Dir Destination set to: /camus/out
Getting the base paths.
The execution base path does not exist. Creating the directory
The history base path does not exist. Creating the directory.
Exception in thread "main" java.io.FileNotFoundException: File /camus/exec
does not exist.
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:397)
 at
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
at org.apache.hadoop.fs.FileSystem.getContentSummary(FileSystem.java:801)
 at com.linkedin.camus.etl.kafka.CamusJob.run(CamusJob.java:223)
at com.linkedin.camus.etl.kafka.CamusJob.run(CamusJob.java:556)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)



I tried changing the dir to hdfs://localhost/camus/out   it says:
$ java -cp target/camus-etl-kafka-0.1.0-SNAPSHOT.jar
com.linkedin.camus.etl.kafka.CamusJob -P camus.properties
Starting Kafka ETL Job
The blacklisted topics: []
The whitelisted topics: []
Dir Destination set to: hdfs://localhost/camus/out
Getting the base paths.
Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS:
hdfs://localhost/camus/exec, expected: file:///
 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:381)
at
org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:55)
 at org.apache.hadoop.fs.LocalFileSystem.pathToFile(LocalFileSystem.java:61)
at org.apache.hadoop.fs.LocalFileSystem.exists(LocalFileSystem.java:51)
 at com.linkedin.camus.etl.kafka.CamusJob.run(CamusJob.java:211)
at com.linkedin.camus.etl.kafka.CamusJob.run(CamusJob.java:556)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)


so how can I let Camus know that this is in a hdfs environment, and the
code in CamusJob.java:140

    private Job createJob(Properties props) throws IOException {
        Job job = new Job(getConf());

get a conf that points to a hdfs setup?

I already set my env HADOOP_CONF_DIR to my running hadoop


thanks
Yang

Re: how can I let camus etl point to hdfs instead of local?

Posted by Yang <te...@gmail.com>.
Neha:

thanks for your response. I figured it out. it seems that the new
hadoop-1.x default port is changed from 8020 to 9000, so I had to use port
9000 in my default fs setting

I have to put a line

defaultFS=hdfs://localhost:9000/

something like that into my camus.properties.


camus_etl@googlegroups.com seems less active :)


thanks
Yang


On Tue, Sep 3, 2013 at 6:31 PM, Neha Narkhede <ne...@gmail.com>wrote:

> I'm sorry I'm not too familiar with the camus code base. Have you tried
> pinging camus_etl@googlegroups.com ?
>
> Thanks,
> Neha
>
>
> On Tue, Sep 3, 2013 at 1:40 PM, Yang <te...@gmail.com> wrote:
>
> > I tried to run camus with the properties file in the examples dir:
> >
> > java -cp ...... com.linkedin.camus.etl.kafka.CamusJob -P
> > myproperties.properties
> >
> >
> > then it says that my output dir does not exist:
> > ~/tools/camus/camus-etl-kafka$ java -cp
> > target/camus-etl-kafka-0.1.0-SNAPSHOT.jar
> > com.linkedin.camus.etl.kafka.CamusJob -P camus.properties
> > Starting Kafka ETL Job
> > The blacklisted topics: []
> > The whitelisted topics: []
> > Dir Destination set to: /camus/out
> > Getting the base paths.
> > The execution base path does not exist. Creating the directory
> > The history base path does not exist. Creating the directory.
> > Exception in thread "main" java.io.FileNotFoundException: File
> /camus/exec
> > does not exist.
> > at
> >
> >
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:397)
> >  at
> >
> >
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
> > at org.apache.hadoop.fs.FileSystem.getContentSummary(FileSystem.java:801)
> >  at com.linkedin.camus.etl.kafka.CamusJob.run(CamusJob.java:223)
> > at com.linkedin.camus.etl.kafka.CamusJob.run(CamusJob.java:556)
> >  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> >
> >
> >
> > I tried changing the dir to hdfs://localhost/camus/out   it says:
> > $ java -cp target/camus-etl-kafka-0.1.0-SNAPSHOT.jar
> > com.linkedin.camus.etl.kafka.CamusJob -P camus.properties
> > Starting Kafka ETL Job
> > The blacklisted topics: []
> > The whitelisted topics: []
> > Dir Destination set to: hdfs://localhost/camus/out
> > Getting the base paths.
> > Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS:
> > hdfs://localhost/camus/exec, expected: file:///
> >  at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:381)
> > at
> >
> >
> org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:55)
> >  at
> > org.apache.hadoop.fs.LocalFileSystem.pathToFile(LocalFileSystem.java:61)
> > at org.apache.hadoop.fs.LocalFileSystem.exists(LocalFileSystem.java:51)
> >  at com.linkedin.camus.etl.kafka.CamusJob.run(CamusJob.java:211)
> > at com.linkedin.camus.etl.kafka.CamusJob.run(CamusJob.java:556)
> >  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> >
> >
> > so how can I let Camus know that this is in a hdfs environment, and the
> > code in CamusJob.java:140
> >
> >     private Job createJob(Properties props) throws IOException {
> >         Job job = new Job(getConf());
> >
> > get a conf that points to a hdfs setup?
> >
> > I already set my env HADOOP_CONF_DIR to my running hadoop
> >
> >
> > thanks
> > Yang
> >
>

Re: how can I let camus etl point to hdfs instead of local?

Posted by Neha Narkhede <ne...@gmail.com>.
I'm sorry I'm not too familiar with the camus code base. Have you tried
pinging camus_etl@googlegroups.com ?

Thanks,
Neha


On Tue, Sep 3, 2013 at 1:40 PM, Yang <te...@gmail.com> wrote:

> I tried to run camus with the properties file in the examples dir:
>
> java -cp ...... com.linkedin.camus.etl.kafka.CamusJob -P
> myproperties.properties
>
>
> then it says that my output dir does not exist:
> ~/tools/camus/camus-etl-kafka$ java -cp
> target/camus-etl-kafka-0.1.0-SNAPSHOT.jar
> com.linkedin.camus.etl.kafka.CamusJob -P camus.properties
> Starting Kafka ETL Job
> The blacklisted topics: []
> The whitelisted topics: []
> Dir Destination set to: /camus/out
> Getting the base paths.
> The execution base path does not exist. Creating the directory
> The history base path does not exist. Creating the directory.
> Exception in thread "main" java.io.FileNotFoundException: File /camus/exec
> does not exist.
> at
>
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:397)
>  at
>
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
> at org.apache.hadoop.fs.FileSystem.getContentSummary(FileSystem.java:801)
>  at com.linkedin.camus.etl.kafka.CamusJob.run(CamusJob.java:223)
> at com.linkedin.camus.etl.kafka.CamusJob.run(CamusJob.java:556)
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>
>
>
> I tried changing the dir to hdfs://localhost/camus/out   it says:
> $ java -cp target/camus-etl-kafka-0.1.0-SNAPSHOT.jar
> com.linkedin.camus.etl.kafka.CamusJob -P camus.properties
> Starting Kafka ETL Job
> The blacklisted topics: []
> The whitelisted topics: []
> Dir Destination set to: hdfs://localhost/camus/out
> Getting the base paths.
> Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS:
> hdfs://localhost/camus/exec, expected: file:///
>  at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:381)
> at
>
> org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:55)
>  at
> org.apache.hadoop.fs.LocalFileSystem.pathToFile(LocalFileSystem.java:61)
> at org.apache.hadoop.fs.LocalFileSystem.exists(LocalFileSystem.java:51)
>  at com.linkedin.camus.etl.kafka.CamusJob.run(CamusJob.java:211)
> at com.linkedin.camus.etl.kafka.CamusJob.run(CamusJob.java:556)
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>
>
> so how can I let Camus know that this is in a hdfs environment, and the
> code in CamusJob.java:140
>
>     private Job createJob(Properties props) throws IOException {
>         Job job = new Job(getConf());
>
> get a conf that points to a hdfs setup?
>
> I already set my env HADOOP_CONF_DIR to my running hadoop
>
>
> thanks
> Yang
>