You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jay Vyas <ja...@gmail.com> on 2011/10/28 07:52:03 UTC
writing to hdfs via java api
I found a way to connect to hadoop via hftp, and it works fine, (read only)
:
uri = "hftp://172.16.xxx.xxx:50070/";
System.out.println( "uri: " + uri );
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get( URI.create( uri ), conf );
fs.printStatistics();
However, it appears that hftp is read only, and I want to read/write as well
as copy files, that is, I want to connect over hdfs . How can I enable hdfs
connections so that i can edit the actual , remote filesystem using the file
/ path's APIs ? Are there ssh settings that have to be set before i can do
this > ?
I tried to change the protocol above from "hftp" -> "hdfs", but I got the
following exception ...
Exception in thread "main" java.io.IOException: Call to /
172.16.112.131:50070 failed on local exception: java.io.EOFException at
org.apache.hadoop.ipc.Client.wrapException(Client.java:1139) at
org.apache.hadoop.ipc.Client.call(Client.java:1107) at
org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226) at
$Proxy0.getProtocolVersion(Unknown Source) at
org.apache.hadoop.ipc.RPC.getProxy(RPC.java:398) at
org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384) at
org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:111) at
org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:213) at
org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:180) at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1514) at
org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67) at
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1548) at
org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1530) at
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228) at
sb.HadoopRemote.main(HadoopRemote.java:24)
Re: writing to hdfs via java api
Posted by JAX <ja...@gmail.com>.
Hi tom : which log will have info about why a process was Killed?
Sent from my iPad
On Oct 28, 2011, at 11:41 PM, Tom Melendez <to...@supertom.com> wrote:
> Hi Jay,
>
> Are you able to look at the logs or the web interface? Can you find
> out why it's getting killed?
>
> Also, can you verify that these ports are open and a process is
> connected to them (maybe with netstat)?
>
> http://www.cloudera.com/blog/2009/08/hadoop-default-ports-quick-reference/
>
> Thanks,
>
> Tom
>
> On Fri, Oct 28, 2011 at 7:57 PM, Jay Vyas <ja...@gmail.com> wrote:
>> Thanks tom : Thats interesting....
>>
>> First, I tried, and it complained that the input directory didnt exist, so I
>> ran
>> $> hadoop fs -mkdir /user/cloudera/input
>>
>> Then, I tried to do this :
>>
>> $> hadoop jar /usr/lib/hadoop-0.20/hadoop-examples.jar grep input output2
>> 'dfs[a-z.]+'
>>
>> And it seemed to start working ...... But then it abruptly printed "killed"
>> somehow at the end of the job [scroll down] ?
>>
>> Maybe this is related to why i cant connect ..... ?!
>>
>> 1) the hadoop jar 11/10/14 21:34:43 WARN util.NativeCodeLoader: Unable to
>> load native-hadoop library for your platform... using builtin-java classes
>> where applicable
>> 11/10/14 21:34:43 WARN snappy.LoadSnappy: Snappy native library not loaded
>> 11/10/14 21:34:43 INFO mapred.FileInputFormat: Total input paths to process
>> : 0
>> 11/10/14 21:34:44 INFO mapred.JobClient: Running job: job_201110142010_0009
>> 11/10/14 21:34:45 INFO mapred.JobClient: map 0% reduce 0%
>> 11/10/14 21:34:55 INFO mapred.JobClient: map 0% reduce 100%
>> 11/10/14 21:34:57 INFO mapred.JobClient: Job complete: job_201110142010_0009
>> 11/10/14 21:34:57 INFO mapred.JobClient: Counters: 14
>> 11/10/14 21:34:57 INFO mapred.JobClient: Job Counters
>> 11/10/14 21:34:57 INFO mapred.JobClient: Launched reduce tasks=1
>> 11/10/14 21:34:57 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5627
>> 11/10/14 21:34:57 INFO mapred.JobClient: Total time spent by all reduces
>> waiting after reserving slots (ms)=0
>> 11/10/14 21:34:57 INFO mapred.JobClient: Total time spent by all maps
>> waiting after reserving slots (ms)=0
>> 11/10/14 21:34:57 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=5050
>> 11/10/14 21:34:57 INFO mapred.JobClient: FileSystemCounters
>> 11/10/14 21:34:57 INFO mapred.JobClient: FILE_BYTES_WRITTEN=53452
>> 11/10/14 21:34:57 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=86
>> 11/10/14 21:34:57 INFO mapred.JobClient: Map-Reduce Framework
>> 11/10/14 21:34:57 INFO mapred.JobClient: Reduce input groups=0
>> 11/10/14 21:34:57 INFO mapred.JobClient: Combine output records=0
>> 11/10/14 21:34:57 INFO mapred.JobClient: Reduce shuffle bytes=0
>> 11/10/14 21:34:57 INFO mapred.JobClient: Reduce output records=0
>> 11/10/14 21:34:57 INFO mapred.JobClient: Spilled Records=0
>> 11/10/14 21:34:57 INFO mapred.JobClient: Combine input records=0
>> 11/10/14 21:34:57 INFO mapred.JobClient: Reduce input records=0
>> 11/10/14 21:34:57 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 11/10/14 21:34:58 INFO mapred.FileInputFormat: Total input paths to process
>> : 1
>> 11/10/14 21:34:58 INFO mapred.JobClient: Running job: job_201110142010_0010
>> 11/10/14 21:34:59 INFO mapred.JobClient: map 0% reduce 0%
>> Killed
>>
>>
>> On Fri, Oct 28, 2011 at 8:24 PM, Tom Melendez <to...@supertom.com> wrote:
>>
>>> Hi Jay,
>>>
>>> Some questions for you:
>>>
>>> - Does the hadoop client itself work from that same machine?
>>> - Are you actually able to run the hadoop example jar (in other words,
>>> your setup is valid otherwise)?
>>> - Is port 8020 actually available? (you can telnet or nc to it?)
>>> - What does jps show on the namenode?
>>>
>>> Thanks,
>>>
>>> Tom
>>>
>>> On Fri, Oct 28, 2011 at 4:04 PM, Jay Vyas <ja...@gmail.com> wrote:
>>>> Hi guys : Made more progress debugging my hadoop connection, but still
>>>> haven't got it working...... It looks like my VM (cloudera hadoop) won't
>>>> let me in. I find that there is no issue connecting to the name node -
>>> that
>>>> is , using hftp and 50070......
>>>>
>>>> via standard HFTP as in here :
>>>>
>>>> //This method works fine - connecting directly to hadoop's namenode and
>>>> querying the filesystem
>>>> public static void main1(String[] args) throws Exception
>>>> {
>>>> String uri = "hftp://155.37.101.76:50070/";
>>>>
>>>> System.out.println( "uri: " + uri );
>>>> Configuration conf = new Configuration();
>>>>
>>>> FileSystem fs = FileSystem.get( URI.create( uri ), conf );
>>>> fs.printStatistics();
>>>> }
>>>>
>>>>
>>>> But unfortunately, I can't get into hdfs ..... Any thoughts on this ? I
>>> am
>>>> modifying the uri to access port 8020
>>>> which is what is in my core-site.xml .
>>>>
>>>> // This fails, resulting (trys to connect over and over again,
>>> eventually
>>>> gives up printing "already tried to connect 20 times"....)
>>>> public static void main(String[] args)
>>>> {
>>>> try {
>>>> String uri = "hdfs://155.37.101.76:8020/";
>>>>
>>>> System.out.println( "uri: " + uri );
>>>> Configuration conf = new Configuration();
>>>>
>>>> FileSystem fs = FileSystem.get( URI.create( uri ), conf );
>>>> fs.printStatistics();
>>>> } catch (Exception e) {
>>>> // TODO Auto-generated catch block
>>>> e.printStackTrace();
>>>> }
>>>> }
>>>>
>>>> The error message is :
>>>>
>>>> 11/10/28 19:03:38 INFO ipc.Client: Retrying connect to server: /
>>>> 155.37.101.76:8020. Already tried 0 time(s).
>>>> 11/10/28 19:03:39 INFO ipc.Client: Retrying connect to server: /
>>>> 155.37.101.76:8020. Already tried 1 time(s).
>>>> 11/10/28 19:03:40 INFO ipc.Client: Retrying connect to server: /
>>>> 155.37.101.76:8020. Already tried 2 time(s).
>>>> 11/10/28 19:03:41 INFO ipc.Client: Retrying connect to server: /
>>>> 155.37.101.76:8020. Already tried 3 time(s).
>>>>
>>>> Any thoughts on this would be *really* be appreciated ... Thanks guys.
>>>>
>>>
>>
>>
>>
>> --
>> Jay Vyas
>> MMSB/UCHC
>>
Re: writing to hdfs via java api
Posted by Alex Gauthier <al...@gmail.com>.
Touché my friend... if only I could only.... :)
On Fri, Oct 28, 2011 at 9:16 PM, JAX <ja...@gmail.com> wrote:
> Yup.... Brutal :-|
> but you never regret fixing a bug ... Unlike -------
>
> Sent from my iPad
>
> On Oct 28, 2011, at 11:43 PM, Alex Gauthier <al...@gmail.com>
> wrote:
>
> > Brutal Friday night. Coding < pussy.
> >
> > :)
> >
> > On Fri, Oct 28, 2011 at 8:43 PM, Alex Gauthier <alexgauthier24@gmail.com
> >wrote:
> >
> >>
> >>
> >> On Fri, Oct 28, 2011 at 8:41 PM, Tom Melendez <to...@supertom.com> wrote:
> >>
> >>> Hi Jay,
> >>>
> >>> Are you able to look at the logs or the web interface? Can you find
> >>> out why it's getting killed?
> >>>
> >>> Also, can you verify that these ports are open and a process is
> >>> connected to them (maybe with netstat)?
> >>>
> >>>
> http://www.cloudera.com/blog/2009/08/hadoop-default-ports-quick-reference/
> >>>
> >>> Thanks,
> >>>
> >>> Tom
> >>>
> >>> On Fri, Oct 28, 2011 at 7:57 PM, Jay Vyas <ja...@gmail.com>
> wrote:
> >>>> Thanks tom : Thats interesting....
> >>>>
> >>>> First, I tried, and it complained that the input directory didnt
> exist,
> >>> so I
> >>>> ran
> >>>> $> hadoop fs -mkdir /user/cloudera/input
> >>>>
> >>>> Then, I tried to do this :
> >>>>
> >>>> $> hadoop jar /usr/lib/hadoop-0.20/hadoop-examples.jar grep input
> >>> output2
> >>>> 'dfs[a-z.]+'
> >>>>
> >>>> And it seemed to start working ...... But then it abruptly printed
> >>> "killed"
> >>>> somehow at the end of the job [scroll down] ?
> >>>>
> >>>> Maybe this is related to why i cant connect ..... ?!
> >>>>
> >>>> 1) the hadoop jar 11/10/14 21:34:43 WARN util.NativeCodeLoader: Unable
> >>> to
> >>>> load native-hadoop library for your platform... using builtin-java
> >>> classes
> >>>> where applicable
> >>>> 11/10/14 21:34:43 WARN snappy.LoadSnappy: Snappy native library not
> >>> loaded
> >>>> 11/10/14 21:34:43 INFO mapred.FileInputFormat: Total input paths to
> >>> process
> >>>> : 0
> >>>> 11/10/14 21:34:44 INFO mapred.JobClient: Running job:
> >>> job_201110142010_0009
> >>>> 11/10/14 21:34:45 INFO mapred.JobClient: map 0% reduce 0%
> >>>> 11/10/14 21:34:55 INFO mapred.JobClient: map 0% reduce 100%
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Job complete:
> >>> job_201110142010_0009
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Counters: 14
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Job Counters
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Launched reduce tasks=1
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5627
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Total time spent by all
> >>> reduces
> >>>> waiting after reserving slots (ms)=0
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Total time spent by all
> >>> maps
> >>>> waiting after reserving slots (ms)=0
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=5050
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient: FileSystemCounters
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient: FILE_BYTES_WRITTEN=53452
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=86
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Map-Reduce Framework
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Reduce input groups=0
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Combine output records=0
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Reduce shuffle bytes=0
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Reduce output records=0
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Spilled Records=0
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Combine input records=0
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Reduce input records=0
> >>>> 11/10/14 21:34:57 WARN mapred.JobClient: Use GenericOptionsParser for
> >>>> parsing the arguments. Applications should implement Tool for the
> same.
> >>>> 11/10/14 21:34:58 INFO mapred.FileInputFormat: Total input paths to
> >>> process
> >>>> : 1
> >>>> 11/10/14 21:34:58 INFO mapred.JobClient: Running job:
> >>> job_201110142010_0010
> >>>> 11/10/14 21:34:59 INFO mapred.JobClient: map 0% reduce 0%
> >>>> Killed
> >>>>
> >>>>
> >>>> On Fri, Oct 28, 2011 at 8:24 PM, Tom Melendez <to...@supertom.com>
> wrote:
> >>>>
> >>>>> Hi Jay,
> >>>>>
> >>>>> Some questions for you:
> >>>>>
> >>>>> - Does the hadoop client itself work from that same machine?
> >>>>> - Are you actually able to run the hadoop example jar (in other
> words,
> >>>>> your setup is valid otherwise)?
> >>>>> - Is port 8020 actually available? (you can telnet or nc to it?)
> >>>>> - What does jps show on the namenode?
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Tom
> >>>>>
> >>>>> On Fri, Oct 28, 2011 at 4:04 PM, Jay Vyas <ja...@gmail.com>
> >>> wrote:
> >>>>>> Hi guys : Made more progress debugging my hadoop connection, but
> >>> still
> >>>>>> haven't got it working...... It looks like my VM (cloudera hadoop)
> >>> won't
> >>>>>> let me in. I find that there is no issue connecting to the name
> node
> >>> -
> >>>>> that
> >>>>>> is , using hftp and 50070......
> >>>>>>
> >>>>>> via standard HFTP as in here :
> >>>>>>
> >>>>>> //This method works fine - connecting directly to hadoop's namenode
> >>> and
> >>>>>> querying the filesystem
> >>>>>> public static void main1(String[] args) throws Exception
> >>>>>> {
> >>>>>> String uri = "hftp://155.37.101.76:50070/";
> >>>>>>
> >>>>>> System.out.println( "uri: " + uri );
> >>>>>> Configuration conf = new Configuration();
> >>>>>>
> >>>>>> FileSystem fs = FileSystem.get( URI.create( uri ), conf );
> >>>>>> fs.printStatistics();
> >>>>>> }
> >>>>>>
> >>>>>>
> >>>>>> But unfortunately, I can't get into hdfs ..... Any thoughts on this
> ?
> >>> I
> >>>>> am
> >>>>>> modifying the uri to access port 8020
> >>>>>> which is what is in my core-site.xml .
> >>>>>>
> >>>>>> // This fails, resulting (trys to connect over and over again,
> >>>>> eventually
> >>>>>> gives up printing "already tried to connect 20 times"....)
> >>>>>> public static void main(String[] args)
> >>>>>> {
> >>>>>> try {
> >>>>>> String uri = "hdfs://155.37.101.76:8020/";
> >>>>>>
> >>>>>> System.out.println( "uri: " + uri );
> >>>>>> Configuration conf = new Configuration();
> >>>>>>
> >>>>>> FileSystem fs = FileSystem.get( URI.create( uri ), conf );
> >>>>>> fs.printStatistics();
> >>>>>> } catch (Exception e) {
> >>>>>> // TODO Auto-generated catch block
> >>>>>> e.printStackTrace();
> >>>>>> }
> >>>>>> }
> >>>>>>
> >>>>>> The error message is :
> >>>>>>
> >>>>>> 11/10/28 19:03:38 INFO ipc.Client: Retrying connect to server: /
> >>>>>> 155.37.101.76:8020. Already tried 0 time(s).
> >>>>>> 11/10/28 19:03:39 INFO ipc.Client: Retrying connect to server: /
> >>>>>> 155.37.101.76:8020. Already tried 1 time(s).
> >>>>>> 11/10/28 19:03:40 INFO ipc.Client: Retrying connect to server: /
> >>>>>> 155.37.101.76:8020. Already tried 2 time(s).
> >>>>>> 11/10/28 19:03:41 INFO ipc.Client: Retrying connect to server: /
> >>>>>> 155.37.101.76:8020. Already tried 3 time(s).
> >>>>>>
> >>>>>> Any thoughts on this would be *really* be appreciated ... Thanks
> >>> guys.
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Jay Vyas
> >>>> MMSB/UCHC
> >>>>
> >>>
> >>
> >>
>
Re: writing to hdfs via java api
Posted by JAX <ja...@gmail.com>.
Yup.... Brutal :-|
but you never regret fixing a bug ... Unlike -------
Sent from my iPad
On Oct 28, 2011, at 11:43 PM, Alex Gauthier <al...@gmail.com> wrote:
> Brutal Friday night. Coding < pussy.
>
> :)
>
> On Fri, Oct 28, 2011 at 8:43 PM, Alex Gauthier <al...@gmail.com>wrote:
>
>>
>>
>> On Fri, Oct 28, 2011 at 8:41 PM, Tom Melendez <to...@supertom.com> wrote:
>>
>>> Hi Jay,
>>>
>>> Are you able to look at the logs or the web interface? Can you find
>>> out why it's getting killed?
>>>
>>> Also, can you verify that these ports are open and a process is
>>> connected to them (maybe with netstat)?
>>>
>>> http://www.cloudera.com/blog/2009/08/hadoop-default-ports-quick-reference/
>>>
>>> Thanks,
>>>
>>> Tom
>>>
>>> On Fri, Oct 28, 2011 at 7:57 PM, Jay Vyas <ja...@gmail.com> wrote:
>>>> Thanks tom : Thats interesting....
>>>>
>>>> First, I tried, and it complained that the input directory didnt exist,
>>> so I
>>>> ran
>>>> $> hadoop fs -mkdir /user/cloudera/input
>>>>
>>>> Then, I tried to do this :
>>>>
>>>> $> hadoop jar /usr/lib/hadoop-0.20/hadoop-examples.jar grep input
>>> output2
>>>> 'dfs[a-z.]+'
>>>>
>>>> And it seemed to start working ...... But then it abruptly printed
>>> "killed"
>>>> somehow at the end of the job [scroll down] ?
>>>>
>>>> Maybe this is related to why i cant connect ..... ?!
>>>>
>>>> 1) the hadoop jar 11/10/14 21:34:43 WARN util.NativeCodeLoader: Unable
>>> to
>>>> load native-hadoop library for your platform... using builtin-java
>>> classes
>>>> where applicable
>>>> 11/10/14 21:34:43 WARN snappy.LoadSnappy: Snappy native library not
>>> loaded
>>>> 11/10/14 21:34:43 INFO mapred.FileInputFormat: Total input paths to
>>> process
>>>> : 0
>>>> 11/10/14 21:34:44 INFO mapred.JobClient: Running job:
>>> job_201110142010_0009
>>>> 11/10/14 21:34:45 INFO mapred.JobClient: map 0% reduce 0%
>>>> 11/10/14 21:34:55 INFO mapred.JobClient: map 0% reduce 100%
>>>> 11/10/14 21:34:57 INFO mapred.JobClient: Job complete:
>>> job_201110142010_0009
>>>> 11/10/14 21:34:57 INFO mapred.JobClient: Counters: 14
>>>> 11/10/14 21:34:57 INFO mapred.JobClient: Job Counters
>>>> 11/10/14 21:34:57 INFO mapred.JobClient: Launched reduce tasks=1
>>>> 11/10/14 21:34:57 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5627
>>>> 11/10/14 21:34:57 INFO mapred.JobClient: Total time spent by all
>>> reduces
>>>> waiting after reserving slots (ms)=0
>>>> 11/10/14 21:34:57 INFO mapred.JobClient: Total time spent by all
>>> maps
>>>> waiting after reserving slots (ms)=0
>>>> 11/10/14 21:34:57 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=5050
>>>> 11/10/14 21:34:57 INFO mapred.JobClient: FileSystemCounters
>>>> 11/10/14 21:34:57 INFO mapred.JobClient: FILE_BYTES_WRITTEN=53452
>>>> 11/10/14 21:34:57 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=86
>>>> 11/10/14 21:34:57 INFO mapred.JobClient: Map-Reduce Framework
>>>> 11/10/14 21:34:57 INFO mapred.JobClient: Reduce input groups=0
>>>> 11/10/14 21:34:57 INFO mapred.JobClient: Combine output records=0
>>>> 11/10/14 21:34:57 INFO mapred.JobClient: Reduce shuffle bytes=0
>>>> 11/10/14 21:34:57 INFO mapred.JobClient: Reduce output records=0
>>>> 11/10/14 21:34:57 INFO mapred.JobClient: Spilled Records=0
>>>> 11/10/14 21:34:57 INFO mapred.JobClient: Combine input records=0
>>>> 11/10/14 21:34:57 INFO mapred.JobClient: Reduce input records=0
>>>> 11/10/14 21:34:57 WARN mapred.JobClient: Use GenericOptionsParser for
>>>> parsing the arguments. Applications should implement Tool for the same.
>>>> 11/10/14 21:34:58 INFO mapred.FileInputFormat: Total input paths to
>>> process
>>>> : 1
>>>> 11/10/14 21:34:58 INFO mapred.JobClient: Running job:
>>> job_201110142010_0010
>>>> 11/10/14 21:34:59 INFO mapred.JobClient: map 0% reduce 0%
>>>> Killed
>>>>
>>>>
>>>> On Fri, Oct 28, 2011 at 8:24 PM, Tom Melendez <to...@supertom.com> wrote:
>>>>
>>>>> Hi Jay,
>>>>>
>>>>> Some questions for you:
>>>>>
>>>>> - Does the hadoop client itself work from that same machine?
>>>>> - Are you actually able to run the hadoop example jar (in other words,
>>>>> your setup is valid otherwise)?
>>>>> - Is port 8020 actually available? (you can telnet or nc to it?)
>>>>> - What does jps show on the namenode?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Tom
>>>>>
>>>>> On Fri, Oct 28, 2011 at 4:04 PM, Jay Vyas <ja...@gmail.com>
>>> wrote:
>>>>>> Hi guys : Made more progress debugging my hadoop connection, but
>>> still
>>>>>> haven't got it working...... It looks like my VM (cloudera hadoop)
>>> won't
>>>>>> let me in. I find that there is no issue connecting to the name node
>>> -
>>>>> that
>>>>>> is , using hftp and 50070......
>>>>>>
>>>>>> via standard HFTP as in here :
>>>>>>
>>>>>> //This method works fine - connecting directly to hadoop's namenode
>>> and
>>>>>> querying the filesystem
>>>>>> public static void main1(String[] args) throws Exception
>>>>>> {
>>>>>> String uri = "hftp://155.37.101.76:50070/";
>>>>>>
>>>>>> System.out.println( "uri: " + uri );
>>>>>> Configuration conf = new Configuration();
>>>>>>
>>>>>> FileSystem fs = FileSystem.get( URI.create( uri ), conf );
>>>>>> fs.printStatistics();
>>>>>> }
>>>>>>
>>>>>>
>>>>>> But unfortunately, I can't get into hdfs ..... Any thoughts on this ?
>>> I
>>>>> am
>>>>>> modifying the uri to access port 8020
>>>>>> which is what is in my core-site.xml .
>>>>>>
>>>>>> // This fails, resulting (trys to connect over and over again,
>>>>> eventually
>>>>>> gives up printing "already tried to connect 20 times"....)
>>>>>> public static void main(String[] args)
>>>>>> {
>>>>>> try {
>>>>>> String uri = "hdfs://155.37.101.76:8020/";
>>>>>>
>>>>>> System.out.println( "uri: " + uri );
>>>>>> Configuration conf = new Configuration();
>>>>>>
>>>>>> FileSystem fs = FileSystem.get( URI.create( uri ), conf );
>>>>>> fs.printStatistics();
>>>>>> } catch (Exception e) {
>>>>>> // TODO Auto-generated catch block
>>>>>> e.printStackTrace();
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> The error message is :
>>>>>>
>>>>>> 11/10/28 19:03:38 INFO ipc.Client: Retrying connect to server: /
>>>>>> 155.37.101.76:8020. Already tried 0 time(s).
>>>>>> 11/10/28 19:03:39 INFO ipc.Client: Retrying connect to server: /
>>>>>> 155.37.101.76:8020. Already tried 1 time(s).
>>>>>> 11/10/28 19:03:40 INFO ipc.Client: Retrying connect to server: /
>>>>>> 155.37.101.76:8020. Already tried 2 time(s).
>>>>>> 11/10/28 19:03:41 INFO ipc.Client: Retrying connect to server: /
>>>>>> 155.37.101.76:8020. Already tried 3 time(s).
>>>>>>
>>>>>> Any thoughts on this would be *really* be appreciated ... Thanks
>>> guys.
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Jay Vyas
>>>> MMSB/UCHC
>>>>
>>>
>>
>>
Re: writing to hdfs via java api
Posted by Alex Gauthier <al...@gmail.com>.
Brutal Friday night. Coding < pussy.
:)
On Fri, Oct 28, 2011 at 8:43 PM, Alex Gauthier <al...@gmail.com>wrote:
>
>
> On Fri, Oct 28, 2011 at 8:41 PM, Tom Melendez <to...@supertom.com> wrote:
>
>> Hi Jay,
>>
>> Are you able to look at the logs or the web interface? Can you find
>> out why it's getting killed?
>>
>> Also, can you verify that these ports are open and a process is
>> connected to them (maybe with netstat)?
>>
>> http://www.cloudera.com/blog/2009/08/hadoop-default-ports-quick-reference/
>>
>> Thanks,
>>
>> Tom
>>
>> On Fri, Oct 28, 2011 at 7:57 PM, Jay Vyas <ja...@gmail.com> wrote:
>> > Thanks tom : Thats interesting....
>> >
>> > First, I tried, and it complained that the input directory didnt exist,
>> so I
>> > ran
>> > $> hadoop fs -mkdir /user/cloudera/input
>> >
>> > Then, I tried to do this :
>> >
>> > $> hadoop jar /usr/lib/hadoop-0.20/hadoop-examples.jar grep input
>> output2
>> > 'dfs[a-z.]+'
>> >
>> > And it seemed to start working ...... But then it abruptly printed
>> "killed"
>> > somehow at the end of the job [scroll down] ?
>> >
>> > Maybe this is related to why i cant connect ..... ?!
>> >
>> > 1) the hadoop jar 11/10/14 21:34:43 WARN util.NativeCodeLoader: Unable
>> to
>> > load native-hadoop library for your platform... using builtin-java
>> classes
>> > where applicable
>> > 11/10/14 21:34:43 WARN snappy.LoadSnappy: Snappy native library not
>> loaded
>> > 11/10/14 21:34:43 INFO mapred.FileInputFormat: Total input paths to
>> process
>> > : 0
>> > 11/10/14 21:34:44 INFO mapred.JobClient: Running job:
>> job_201110142010_0009
>> > 11/10/14 21:34:45 INFO mapred.JobClient: map 0% reduce 0%
>> > 11/10/14 21:34:55 INFO mapred.JobClient: map 0% reduce 100%
>> > 11/10/14 21:34:57 INFO mapred.JobClient: Job complete:
>> job_201110142010_0009
>> > 11/10/14 21:34:57 INFO mapred.JobClient: Counters: 14
>> > 11/10/14 21:34:57 INFO mapred.JobClient: Job Counters
>> > 11/10/14 21:34:57 INFO mapred.JobClient: Launched reduce tasks=1
>> > 11/10/14 21:34:57 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5627
>> > 11/10/14 21:34:57 INFO mapred.JobClient: Total time spent by all
>> reduces
>> > waiting after reserving slots (ms)=0
>> > 11/10/14 21:34:57 INFO mapred.JobClient: Total time spent by all
>> maps
>> > waiting after reserving slots (ms)=0
>> > 11/10/14 21:34:57 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=5050
>> > 11/10/14 21:34:57 INFO mapred.JobClient: FileSystemCounters
>> > 11/10/14 21:34:57 INFO mapred.JobClient: FILE_BYTES_WRITTEN=53452
>> > 11/10/14 21:34:57 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=86
>> > 11/10/14 21:34:57 INFO mapred.JobClient: Map-Reduce Framework
>> > 11/10/14 21:34:57 INFO mapred.JobClient: Reduce input groups=0
>> > 11/10/14 21:34:57 INFO mapred.JobClient: Combine output records=0
>> > 11/10/14 21:34:57 INFO mapred.JobClient: Reduce shuffle bytes=0
>> > 11/10/14 21:34:57 INFO mapred.JobClient: Reduce output records=0
>> > 11/10/14 21:34:57 INFO mapred.JobClient: Spilled Records=0
>> > 11/10/14 21:34:57 INFO mapred.JobClient: Combine input records=0
>> > 11/10/14 21:34:57 INFO mapred.JobClient: Reduce input records=0
>> > 11/10/14 21:34:57 WARN mapred.JobClient: Use GenericOptionsParser for
>> > parsing the arguments. Applications should implement Tool for the same.
>> > 11/10/14 21:34:58 INFO mapred.FileInputFormat: Total input paths to
>> process
>> > : 1
>> > 11/10/14 21:34:58 INFO mapred.JobClient: Running job:
>> job_201110142010_0010
>> > 11/10/14 21:34:59 INFO mapred.JobClient: map 0% reduce 0%
>> > Killed
>> >
>> >
>> > On Fri, Oct 28, 2011 at 8:24 PM, Tom Melendez <to...@supertom.com> wrote:
>> >
>> >> Hi Jay,
>> >>
>> >> Some questions for you:
>> >>
>> >> - Does the hadoop client itself work from that same machine?
>> >> - Are you actually able to run the hadoop example jar (in other words,
>> >> your setup is valid otherwise)?
>> >> - Is port 8020 actually available? (you can telnet or nc to it?)
>> >> - What does jps show on the namenode?
>> >>
>> >> Thanks,
>> >>
>> >> Tom
>> >>
>> >> On Fri, Oct 28, 2011 at 4:04 PM, Jay Vyas <ja...@gmail.com>
>> wrote:
>> >> > Hi guys : Made more progress debugging my hadoop connection, but
>> still
>> >> > haven't got it working...... It looks like my VM (cloudera hadoop)
>> won't
>> >> > let me in. I find that there is no issue connecting to the name node
>> -
>> >> that
>> >> > is , using hftp and 50070......
>> >> >
>> >> > via standard HFTP as in here :
>> >> >
>> >> > //This method works fine - connecting directly to hadoop's namenode
>> and
>> >> > querying the filesystem
>> >> > public static void main1(String[] args) throws Exception
>> >> > {
>> >> > String uri = "hftp://155.37.101.76:50070/";
>> >> >
>> >> > System.out.println( "uri: " + uri );
>> >> > Configuration conf = new Configuration();
>> >> >
>> >> > FileSystem fs = FileSystem.get( URI.create( uri ), conf );
>> >> > fs.printStatistics();
>> >> > }
>> >> >
>> >> >
>> >> > But unfortunately, I can't get into hdfs ..... Any thoughts on this ?
>> I
>> >> am
>> >> > modifying the uri to access port 8020
>> >> > which is what is in my core-site.xml .
>> >> >
>> >> > // This fails, resulting (trys to connect over and over again,
>> >> eventually
>> >> > gives up printing "already tried to connect 20 times"....)
>> >> > public static void main(String[] args)
>> >> > {
>> >> > try {
>> >> > String uri = "hdfs://155.37.101.76:8020/";
>> >> >
>> >> > System.out.println( "uri: " + uri );
>> >> > Configuration conf = new Configuration();
>> >> >
>> >> > FileSystem fs = FileSystem.get( URI.create( uri ), conf );
>> >> > fs.printStatistics();
>> >> > } catch (Exception e) {
>> >> > // TODO Auto-generated catch block
>> >> > e.printStackTrace();
>> >> > }
>> >> > }
>> >> >
>> >> > The error message is :
>> >> >
>> >> > 11/10/28 19:03:38 INFO ipc.Client: Retrying connect to server: /
>> >> > 155.37.101.76:8020. Already tried 0 time(s).
>> >> > 11/10/28 19:03:39 INFO ipc.Client: Retrying connect to server: /
>> >> > 155.37.101.76:8020. Already tried 1 time(s).
>> >> > 11/10/28 19:03:40 INFO ipc.Client: Retrying connect to server: /
>> >> > 155.37.101.76:8020. Already tried 2 time(s).
>> >> > 11/10/28 19:03:41 INFO ipc.Client: Retrying connect to server: /
>> >> > 155.37.101.76:8020. Already tried 3 time(s).
>> >> >
>> >> > Any thoughts on this would be *really* be appreciated ... Thanks
>> guys.
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> > Jay Vyas
>> > MMSB/UCHC
>> >
>>
>
>
Re: writing to hdfs via java api
Posted by Alex Gauthier <al...@gmail.com>.
On Fri, Oct 28, 2011 at 8:41 PM, Tom Melendez <to...@supertom.com> wrote:
> Hi Jay,
>
> Are you able to look at the logs or the web interface? Can you find
> out why it's getting killed?
>
> Also, can you verify that these ports are open and a process is
> connected to them (maybe with netstat)?
>
> http://www.cloudera.com/blog/2009/08/hadoop-default-ports-quick-reference/
>
> Thanks,
>
> Tom
>
> On Fri, Oct 28, 2011 at 7:57 PM, Jay Vyas <ja...@gmail.com> wrote:
> > Thanks tom : Thats interesting....
> >
> > First, I tried, and it complained that the input directory didnt exist,
> so I
> > ran
> > $> hadoop fs -mkdir /user/cloudera/input
> >
> > Then, I tried to do this :
> >
> > $> hadoop jar /usr/lib/hadoop-0.20/hadoop-examples.jar grep input output2
> > 'dfs[a-z.]+'
> >
> > And it seemed to start working ...... But then it abruptly printed
> "killed"
> > somehow at the end of the job [scroll down] ?
> >
> > Maybe this is related to why i cant connect ..... ?!
> >
> > 1) the hadoop jar 11/10/14 21:34:43 WARN util.NativeCodeLoader: Unable to
> > load native-hadoop library for your platform... using builtin-java
> classes
> > where applicable
> > 11/10/14 21:34:43 WARN snappy.LoadSnappy: Snappy native library not
> loaded
> > 11/10/14 21:34:43 INFO mapred.FileInputFormat: Total input paths to
> process
> > : 0
> > 11/10/14 21:34:44 INFO mapred.JobClient: Running job:
> job_201110142010_0009
> > 11/10/14 21:34:45 INFO mapred.JobClient: map 0% reduce 0%
> > 11/10/14 21:34:55 INFO mapred.JobClient: map 0% reduce 100%
> > 11/10/14 21:34:57 INFO mapred.JobClient: Job complete:
> job_201110142010_0009
> > 11/10/14 21:34:57 INFO mapred.JobClient: Counters: 14
> > 11/10/14 21:34:57 INFO mapred.JobClient: Job Counters
> > 11/10/14 21:34:57 INFO mapred.JobClient: Launched reduce tasks=1
> > 11/10/14 21:34:57 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5627
> > 11/10/14 21:34:57 INFO mapred.JobClient: Total time spent by all
> reduces
> > waiting after reserving slots (ms)=0
> > 11/10/14 21:34:57 INFO mapred.JobClient: Total time spent by all maps
> > waiting after reserving slots (ms)=0
> > 11/10/14 21:34:57 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=5050
> > 11/10/14 21:34:57 INFO mapred.JobClient: FileSystemCounters
> > 11/10/14 21:34:57 INFO mapred.JobClient: FILE_BYTES_WRITTEN=53452
> > 11/10/14 21:34:57 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=86
> > 11/10/14 21:34:57 INFO mapred.JobClient: Map-Reduce Framework
> > 11/10/14 21:34:57 INFO mapred.JobClient: Reduce input groups=0
> > 11/10/14 21:34:57 INFO mapred.JobClient: Combine output records=0
> > 11/10/14 21:34:57 INFO mapred.JobClient: Reduce shuffle bytes=0
> > 11/10/14 21:34:57 INFO mapred.JobClient: Reduce output records=0
> > 11/10/14 21:34:57 INFO mapred.JobClient: Spilled Records=0
> > 11/10/14 21:34:57 INFO mapred.JobClient: Combine input records=0
> > 11/10/14 21:34:57 INFO mapred.JobClient: Reduce input records=0
> > 11/10/14 21:34:57 WARN mapred.JobClient: Use GenericOptionsParser for
> > parsing the arguments. Applications should implement Tool for the same.
> > 11/10/14 21:34:58 INFO mapred.FileInputFormat: Total input paths to
> process
> > : 1
> > 11/10/14 21:34:58 INFO mapred.JobClient: Running job:
> job_201110142010_0010
> > 11/10/14 21:34:59 INFO mapred.JobClient: map 0% reduce 0%
> > Killed
> >
> >
> > On Fri, Oct 28, 2011 at 8:24 PM, Tom Melendez <to...@supertom.com> wrote:
> >
> >> Hi Jay,
> >>
> >> Some questions for you:
> >>
> >> - Does the hadoop client itself work from that same machine?
> >> - Are you actually able to run the hadoop example jar (in other words,
> >> your setup is valid otherwise)?
> >> - Is port 8020 actually available? (you can telnet or nc to it?)
> >> - What does jps show on the namenode?
> >>
> >> Thanks,
> >>
> >> Tom
> >>
> >> On Fri, Oct 28, 2011 at 4:04 PM, Jay Vyas <ja...@gmail.com> wrote:
> >> > Hi guys : Made more progress debugging my hadoop connection, but still
> >> > haven't got it working...... It looks like my VM (cloudera hadoop)
> won't
> >> > let me in. I find that there is no issue connecting to the name node
> -
> >> that
> >> > is , using hftp and 50070......
> >> >
> >> > via standard HFTP as in here :
> >> >
> >> > //This method works fine - connecting directly to hadoop's namenode
> and
> >> > querying the filesystem
> >> > public static void main1(String[] args) throws Exception
> >> > {
> >> > String uri = "hftp://155.37.101.76:50070/";
> >> >
> >> > System.out.println( "uri: " + uri );
> >> > Configuration conf = new Configuration();
> >> >
> >> > FileSystem fs = FileSystem.get( URI.create( uri ), conf );
> >> > fs.printStatistics();
> >> > }
> >> >
> >> >
> >> > But unfortunately, I can't get into hdfs ..... Any thoughts on this ?
> I
> >> am
> >> > modifying the uri to access port 8020
> >> > which is what is in my core-site.xml .
> >> >
> >> > // This fails, resulting (trys to connect over and over again,
> >> eventually
> >> > gives up printing "already tried to connect 20 times"....)
> >> > public static void main(String[] args)
> >> > {
> >> > try {
> >> > String uri = "hdfs://155.37.101.76:8020/";
> >> >
> >> > System.out.println( "uri: " + uri );
> >> > Configuration conf = new Configuration();
> >> >
> >> > FileSystem fs = FileSystem.get( URI.create( uri ), conf );
> >> > fs.printStatistics();
> >> > } catch (Exception e) {
> >> > // TODO Auto-generated catch block
> >> > e.printStackTrace();
> >> > }
> >> > }
> >> >
> >> > The error message is :
> >> >
> >> > 11/10/28 19:03:38 INFO ipc.Client: Retrying connect to server: /
> >> > 155.37.101.76:8020. Already tried 0 time(s).
> >> > 11/10/28 19:03:39 INFO ipc.Client: Retrying connect to server: /
> >> > 155.37.101.76:8020. Already tried 1 time(s).
> >> > 11/10/28 19:03:40 INFO ipc.Client: Retrying connect to server: /
> >> > 155.37.101.76:8020. Already tried 2 time(s).
> >> > 11/10/28 19:03:41 INFO ipc.Client: Retrying connect to server: /
> >> > 155.37.101.76:8020. Already tried 3 time(s).
> >> >
> >> > Any thoughts on this would be *really* be appreciated ... Thanks
> guys.
> >> >
> >>
> >
> >
> >
> > --
> > Jay Vyas
> > MMSB/UCHC
> >
>
Re: writing to hdfs via java api
Posted by Tom Melendez <to...@supertom.com>.
Hi Jay,
Are you able to look at the logs or the web interface? Can you find
out why it's getting killed?
Also, can you verify that these ports are open and a process is
connected to them (maybe with netstat)?
http://www.cloudera.com/blog/2009/08/hadoop-default-ports-quick-reference/
Thanks,
Tom
On Fri, Oct 28, 2011 at 7:57 PM, Jay Vyas <ja...@gmail.com> wrote:
> Thanks tom : Thats interesting....
>
> First, I tried, and it complained that the input directory didnt exist, so I
> ran
> $> hadoop fs -mkdir /user/cloudera/input
>
> Then, I tried to do this :
>
> $> hadoop jar /usr/lib/hadoop-0.20/hadoop-examples.jar grep input output2
> 'dfs[a-z.]+'
>
> And it seemed to start working ...... But then it abruptly printed "killed"
> somehow at the end of the job [scroll down] ?
>
> Maybe this is related to why i cant connect ..... ?!
>
> 1) the hadoop jar 11/10/14 21:34:43 WARN util.NativeCodeLoader: Unable to
> load native-hadoop library for your platform... using builtin-java classes
> where applicable
> 11/10/14 21:34:43 WARN snappy.LoadSnappy: Snappy native library not loaded
> 11/10/14 21:34:43 INFO mapred.FileInputFormat: Total input paths to process
> : 0
> 11/10/14 21:34:44 INFO mapred.JobClient: Running job: job_201110142010_0009
> 11/10/14 21:34:45 INFO mapred.JobClient: map 0% reduce 0%
> 11/10/14 21:34:55 INFO mapred.JobClient: map 0% reduce 100%
> 11/10/14 21:34:57 INFO mapred.JobClient: Job complete: job_201110142010_0009
> 11/10/14 21:34:57 INFO mapred.JobClient: Counters: 14
> 11/10/14 21:34:57 INFO mapred.JobClient: Job Counters
> 11/10/14 21:34:57 INFO mapred.JobClient: Launched reduce tasks=1
> 11/10/14 21:34:57 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5627
> 11/10/14 21:34:57 INFO mapred.JobClient: Total time spent by all reduces
> waiting after reserving slots (ms)=0
> 11/10/14 21:34:57 INFO mapred.JobClient: Total time spent by all maps
> waiting after reserving slots (ms)=0
> 11/10/14 21:34:57 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=5050
> 11/10/14 21:34:57 INFO mapred.JobClient: FileSystemCounters
> 11/10/14 21:34:57 INFO mapred.JobClient: FILE_BYTES_WRITTEN=53452
> 11/10/14 21:34:57 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=86
> 11/10/14 21:34:57 INFO mapred.JobClient: Map-Reduce Framework
> 11/10/14 21:34:57 INFO mapred.JobClient: Reduce input groups=0
> 11/10/14 21:34:57 INFO mapred.JobClient: Combine output records=0
> 11/10/14 21:34:57 INFO mapred.JobClient: Reduce shuffle bytes=0
> 11/10/14 21:34:57 INFO mapred.JobClient: Reduce output records=0
> 11/10/14 21:34:57 INFO mapred.JobClient: Spilled Records=0
> 11/10/14 21:34:57 INFO mapred.JobClient: Combine input records=0
> 11/10/14 21:34:57 INFO mapred.JobClient: Reduce input records=0
> 11/10/14 21:34:57 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 11/10/14 21:34:58 INFO mapred.FileInputFormat: Total input paths to process
> : 1
> 11/10/14 21:34:58 INFO mapred.JobClient: Running job: job_201110142010_0010
> 11/10/14 21:34:59 INFO mapred.JobClient: map 0% reduce 0%
> Killed
>
>
> On Fri, Oct 28, 2011 at 8:24 PM, Tom Melendez <to...@supertom.com> wrote:
>
>> Hi Jay,
>>
>> Some questions for you:
>>
>> - Does the hadoop client itself work from that same machine?
>> - Are you actually able to run the hadoop example jar (in other words,
>> your setup is valid otherwise)?
>> - Is port 8020 actually available? (you can telnet or nc to it?)
>> - What does jps show on the namenode?
>>
>> Thanks,
>>
>> Tom
>>
>> On Fri, Oct 28, 2011 at 4:04 PM, Jay Vyas <ja...@gmail.com> wrote:
>> > Hi guys : Made more progress debugging my hadoop connection, but still
>> > haven't got it working...... It looks like my VM (cloudera hadoop) won't
>> > let me in. I find that there is no issue connecting to the name node -
>> that
>> > is , using hftp and 50070......
>> >
>> > via standard HFTP as in here :
>> >
>> > //This method works fine - connecting directly to hadoop's namenode and
>> > querying the filesystem
>> > public static void main1(String[] args) throws Exception
>> > {
>> > String uri = "hftp://155.37.101.76:50070/";
>> >
>> > System.out.println( "uri: " + uri );
>> > Configuration conf = new Configuration();
>> >
>> > FileSystem fs = FileSystem.get( URI.create( uri ), conf );
>> > fs.printStatistics();
>> > }
>> >
>> >
>> > But unfortunately, I can't get into hdfs ..... Any thoughts on this ? I
>> am
>> > modifying the uri to access port 8020
>> > which is what is in my core-site.xml .
>> >
>> > // This fails, resulting (trys to connect over and over again,
>> eventually
>> > gives up printing "already tried to connect 20 times"....)
>> > public static void main(String[] args)
>> > {
>> > try {
>> > String uri = "hdfs://155.37.101.76:8020/";
>> >
>> > System.out.println( "uri: " + uri );
>> > Configuration conf = new Configuration();
>> >
>> > FileSystem fs = FileSystem.get( URI.create( uri ), conf );
>> > fs.printStatistics();
>> > } catch (Exception e) {
>> > // TODO Auto-generated catch block
>> > e.printStackTrace();
>> > }
>> > }
>> >
>> > The error message is :
>> >
>> > 11/10/28 19:03:38 INFO ipc.Client: Retrying connect to server: /
>> > 155.37.101.76:8020. Already tried 0 time(s).
>> > 11/10/28 19:03:39 INFO ipc.Client: Retrying connect to server: /
>> > 155.37.101.76:8020. Already tried 1 time(s).
>> > 11/10/28 19:03:40 INFO ipc.Client: Retrying connect to server: /
>> > 155.37.101.76:8020. Already tried 2 time(s).
>> > 11/10/28 19:03:41 INFO ipc.Client: Retrying connect to server: /
>> > 155.37.101.76:8020. Already tried 3 time(s).
>> >
>> > Any thoughts on this would be *really* be appreciated ... Thanks guys.
>> >
>>
>
>
>
> --
> Jay Vyas
> MMSB/UCHC
>
Re: writing to hdfs via java api
Posted by Jay Vyas <ja...@gmail.com>.
Thanks tom : Thats interesting....
First, I tried, and it complained that the input directory didnt exist, so I
ran
$> hadoop fs -mkdir /user/cloudera/input
Then, I tried to do this :
$> hadoop jar /usr/lib/hadoop-0.20/hadoop-examples.jar grep input output2
'dfs[a-z.]+'
And it seemed to start working ...... But then it abruptly printed "killed"
somehow at the end of the job [scroll down] ?
Maybe this is related to why i cant connect ..... ?!
1) the hadoop jar 11/10/14 21:34:43 WARN util.NativeCodeLoader: Unable to
load native-hadoop library for your platform... using builtin-java classes
where applicable
11/10/14 21:34:43 WARN snappy.LoadSnappy: Snappy native library not loaded
11/10/14 21:34:43 INFO mapred.FileInputFormat: Total input paths to process
: 0
11/10/14 21:34:44 INFO mapred.JobClient: Running job: job_201110142010_0009
11/10/14 21:34:45 INFO mapred.JobClient: map 0% reduce 0%
11/10/14 21:34:55 INFO mapred.JobClient: map 0% reduce 100%
11/10/14 21:34:57 INFO mapred.JobClient: Job complete: job_201110142010_0009
11/10/14 21:34:57 INFO mapred.JobClient: Counters: 14
11/10/14 21:34:57 INFO mapred.JobClient: Job Counters
11/10/14 21:34:57 INFO mapred.JobClient: Launched reduce tasks=1
11/10/14 21:34:57 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5627
11/10/14 21:34:57 INFO mapred.JobClient: Total time spent by all reduces
waiting after reserving slots (ms)=0
11/10/14 21:34:57 INFO mapred.JobClient: Total time spent by all maps
waiting after reserving slots (ms)=0
11/10/14 21:34:57 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=5050
11/10/14 21:34:57 INFO mapred.JobClient: FileSystemCounters
11/10/14 21:34:57 INFO mapred.JobClient: FILE_BYTES_WRITTEN=53452
11/10/14 21:34:57 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=86
11/10/14 21:34:57 INFO mapred.JobClient: Map-Reduce Framework
11/10/14 21:34:57 INFO mapred.JobClient: Reduce input groups=0
11/10/14 21:34:57 INFO mapred.JobClient: Combine output records=0
11/10/14 21:34:57 INFO mapred.JobClient: Reduce shuffle bytes=0
11/10/14 21:34:57 INFO mapred.JobClient: Reduce output records=0
11/10/14 21:34:57 INFO mapred.JobClient: Spilled Records=0
11/10/14 21:34:57 INFO mapred.JobClient: Combine input records=0
11/10/14 21:34:57 INFO mapred.JobClient: Reduce input records=0
11/10/14 21:34:57 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
11/10/14 21:34:58 INFO mapred.FileInputFormat: Total input paths to process
: 1
11/10/14 21:34:58 INFO mapred.JobClient: Running job: job_201110142010_0010
11/10/14 21:34:59 INFO mapred.JobClient: map 0% reduce 0%
Killed
On Fri, Oct 28, 2011 at 8:24 PM, Tom Melendez <to...@supertom.com> wrote:
> Hi Jay,
>
> Some questions for you:
>
> - Does the hadoop client itself work from that same machine?
> - Are you actually able to run the hadoop example jar (in other words,
> your setup is valid otherwise)?
> - Is port 8020 actually available? (you can telnet or nc to it?)
> - What does jps show on the namenode?
>
> Thanks,
>
> Tom
>
> On Fri, Oct 28, 2011 at 4:04 PM, Jay Vyas <ja...@gmail.com> wrote:
> > Hi guys : Made more progress debugging my hadoop connection, but still
> > haven't got it working...... It looks like my VM (cloudera hadoop) won't
> > let me in. I find that there is no issue connecting to the name node -
> that
> > is , using hftp and 50070......
> >
> > via standard HFTP as in here :
> >
> > //This method works fine - connecting directly to hadoop's namenode and
> > querying the filesystem
> > public static void main1(String[] args) throws Exception
> > {
> > String uri = "hftp://155.37.101.76:50070/";
> >
> > System.out.println( "uri: " + uri );
> > Configuration conf = new Configuration();
> >
> > FileSystem fs = FileSystem.get( URI.create( uri ), conf );
> > fs.printStatistics();
> > }
> >
> >
> > But unfortunately, I can't get into hdfs ..... Any thoughts on this ? I
> am
> > modifying the uri to access port 8020
> > which is what is in my core-site.xml .
> >
> > // This fails, resulting (trys to connect over and over again,
> eventually
> > gives up printing "already tried to connect 20 times"....)
> > public static void main(String[] args)
> > {
> > try {
> > String uri = "hdfs://155.37.101.76:8020/";
> >
> > System.out.println( "uri: " + uri );
> > Configuration conf = new Configuration();
> >
> > FileSystem fs = FileSystem.get( URI.create( uri ), conf );
> > fs.printStatistics();
> > } catch (Exception e) {
> > // TODO Auto-generated catch block
> > e.printStackTrace();
> > }
> > }
> >
> > The error message is :
> >
> > 11/10/28 19:03:38 INFO ipc.Client: Retrying connect to server: /
> > 155.37.101.76:8020. Already tried 0 time(s).
> > 11/10/28 19:03:39 INFO ipc.Client: Retrying connect to server: /
> > 155.37.101.76:8020. Already tried 1 time(s).
> > 11/10/28 19:03:40 INFO ipc.Client: Retrying connect to server: /
> > 155.37.101.76:8020. Already tried 2 time(s).
> > 11/10/28 19:03:41 INFO ipc.Client: Retrying connect to server: /
> > 155.37.101.76:8020. Already tried 3 time(s).
> >
> > Any thoughts on this would be *really* be appreciated ... Thanks guys.
> >
>
--
Jay Vyas
MMSB/UCHC
Re: writing to hdfs via java api
Posted by Tom Melendez <to...@supertom.com>.
Hi Jay,
Some questions for you:
- Does the hadoop client itself work from that same machine?
- Are you actually able to run the hadoop example jar (in other words,
your setup is valid otherwise)?
- Is port 8020 actually available? (you can telnet or nc to it?)
- What does jps show on the namenode?
Thanks,
Tom
On Fri, Oct 28, 2011 at 4:04 PM, Jay Vyas <ja...@gmail.com> wrote:
> Hi guys : Made more progress debugging my hadoop connection, but still
> haven't got it working...... It looks like my VM (cloudera hadoop) won't
> let me in. I find that there is no issue connecting to the name node - that
> is , using hftp and 50070......
>
> via standard HFTP as in here :
>
> //This method works fine - connecting directly to hadoop's namenode and
> querying the filesystem
> public static void main1(String[] args) throws Exception
> {
> String uri = "hftp://155.37.101.76:50070/";
>
> System.out.println( "uri: " + uri );
> Configuration conf = new Configuration();
>
> FileSystem fs = FileSystem.get( URI.create( uri ), conf );
> fs.printStatistics();
> }
>
>
> But unfortunately, I can't get into hdfs ..... Any thoughts on this ? I am
> modifying the uri to access port 8020
> which is what is in my core-site.xml .
>
> // This fails, resulting (trys to connect over and over again, eventually
> gives up printing "already tried to connect 20 times"....)
> public static void main(String[] args)
> {
> try {
> String uri = "hdfs://155.37.101.76:8020/";
>
> System.out.println( "uri: " + uri );
> Configuration conf = new Configuration();
>
> FileSystem fs = FileSystem.get( URI.create( uri ), conf );
> fs.printStatistics();
> } catch (Exception e) {
> // TODO Auto-generated catch block
> e.printStackTrace();
> }
> }
>
> The error message is :
>
> 11/10/28 19:03:38 INFO ipc.Client: Retrying connect to server: /
> 155.37.101.76:8020. Already tried 0 time(s).
> 11/10/28 19:03:39 INFO ipc.Client: Retrying connect to server: /
> 155.37.101.76:8020. Already tried 1 time(s).
> 11/10/28 19:03:40 INFO ipc.Client: Retrying connect to server: /
> 155.37.101.76:8020. Already tried 2 time(s).
> 11/10/28 19:03:41 INFO ipc.Client: Retrying connect to server: /
> 155.37.101.76:8020. Already tried 3 time(s).
>
> Any thoughts on this would be *really* be appreciated ... Thanks guys.
>
Re: writing to hdfs via java api
Posted by Jay Vyas <ja...@gmail.com>.
Hi guys : Made more progress debugging my hadoop connection, but still
haven't got it working...... It looks like my VM (cloudera hadoop) won't
let me in. I find that there is no issue connecting to the name node - that
is , using hftp and 50070......
via standard HFTP as in here :
//This method works fine - connecting directly to hadoop's namenode and
querying the filesystem
public static void main1(String[] args) throws Exception
{
String uri = "hftp://155.37.101.76:50070/";
System.out.println( "uri: " + uri );
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get( URI.create( uri ), conf );
fs.printStatistics();
}
But unfortunately, I can't get into hdfs ..... Any thoughts on this ? I am
modifying the uri to access port 8020
which is what is in my core-site.xml .
// This fails, resulting (trys to connect over and over again, eventually
gives up printing "already tried to connect 20 times"....)
public static void main(String[] args)
{
try {
String uri = "hdfs://155.37.101.76:8020/";
System.out.println( "uri: " + uri );
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get( URI.create( uri ), conf );
fs.printStatistics();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
The error message is :
11/10/28 19:03:38 INFO ipc.Client: Retrying connect to server: /
155.37.101.76:8020. Already tried 0 time(s).
11/10/28 19:03:39 INFO ipc.Client: Retrying connect to server: /
155.37.101.76:8020. Already tried 1 time(s).
11/10/28 19:03:40 INFO ipc.Client: Retrying connect to server: /
155.37.101.76:8020. Already tried 2 time(s).
11/10/28 19:03:41 INFO ipc.Client: Retrying connect to server: /
155.37.101.76:8020. Already tried 3 time(s).
Any thoughts on this would be *really* be appreciated ... Thanks guys.
Re: writing to hdfs via java api
Posted by Harsh J <ha...@cloudera.com>.
Jay,
Using the hdfs:// scheme is the right way, as you have determined. However…
A few things you need to ensure while using the Java FileSystem API to
do your HDFS tasks:
- Connect to NameNode's RPC port, not the web port. Default RPC port
is usually 8020, but your fs.default.name config will tell you the
right one.
- Do your client and server Hadoop versions match perfectly? If not,
make it so as you could run into protocol incompatibility issues
between versions.
- Ensure your client can connect to the RPC ports of NameNode and
DataNode both for reads/writes. If there's a firewall, you may need to
configure it to allow this.
On Fri, Oct 28, 2011 at 11:22 AM, Jay Vyas <ja...@gmail.com> wrote:
> I found a way to connect to hadoop via hftp, and it works fine, (read only)
> :
>
> uri = "hftp://172.16.xxx.xxx:50070/";
>
> System.out.println( "uri: " + uri );
> Configuration conf = new Configuration();
>
> FileSystem fs = FileSystem.get( URI.create( uri ), conf );
> fs.printStatistics();
>
> However, it appears that hftp is read only, and I want to read/write as well
> as copy files, that is, I want to connect over hdfs . How can I enable hdfs
> connections so that i can edit the actual , remote filesystem using the file
> / path's APIs ? Are there ssh settings that have to be set before i can do
> this > ?
>
> I tried to change the protocol above from "hftp" -> "hdfs", but I got the
> following exception ...
>
> Exception in thread "main" java.io.IOException: Call to /
> 172.16.112.131:50070 failed on local exception: java.io.EOFException at
> org.apache.hadoop.ipc.Client.wrapException(Client.java:1139) at
> org.apache.hadoop.ipc.Client.call(Client.java:1107) at
> org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226) at
> $Proxy0.getProtocolVersion(Unknown Source) at
> org.apache.hadoop.ipc.RPC.getProxy(RPC.java:398) at
> org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384) at
> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:111) at
> org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:213) at
> org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:180) at
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1514) at
> org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67) at
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1548) at
> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1530) at
> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228) at
> sb.HadoopRemote.main(HadoopRemote.java:24)
>
--
Harsh J
Re: writing to hdfs via java api
Posted by Arpit Gupta <ar...@hortonworks.com>.
hdfs scheme should work but you will have to change the port. To find
the correct port # look for fs.default.name prop in the core-site.xml
or the namenode ui should also state the port.
--
Arpit
On Oct 27, 2011, at 10:52 PM, Jay Vyas <ja...@gmail.com> wrote:
> I found a way to connect to hadoop via hftp, and it works fine, (read only)
> :
>
> uri = "hftp://172.16.xxx.xxx:50070/";
>
> System.out.println( "uri: " + uri );
> Configuration conf = new Configuration();
>
> FileSystem fs = FileSystem.get( URI.create( uri ), conf );
> fs.printStatistics();
>
> However, it appears that hftp is read only, and I want to read/write as well
> as copy files, that is, I want to connect over hdfs . How can I enable hdfs
> connections so that i can edit the actual , remote filesystem using the file
> / path's APIs ? Are there ssh settings that have to be set before i can do
> this > ?
>
> I tried to change the protocol above from "hftp" -> "hdfs", but I got the
> following exception ...
>
> Exception in thread "main" java.io.IOException: Call to /
> 172.16.112.131:50070 failed on local exception: java.io.EOFException at
> org.apache.hadoop.ipc.Client.wrapException(Client.java:1139) at
> org.apache.hadoop.ipc.Client.call(Client.java:1107) at
> org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226) at
> $Proxy0.getProtocolVersion(Unknown Source) at
> org.apache.hadoop.ipc.RPC.getProxy(RPC.java:398) at
> org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384) at
> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:111) at
> org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:213) at
> org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:180) at
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1514) at
> org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67) at
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1548) at
> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1530) at
> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228) at
> sb.HadoopRemote.main(HadoopRemote.java:24)