You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jay Vyas <ja...@gmail.com> on 2011/10/28 07:52:03 UTC

writing to hdfs via java api

I found a way to connect to hadoop via hftp, and it works fine, (read only)
:

    uri = "hftp://172.16.xxx.xxx:50070/";

    System.out.println( "uri: " + uri );
    Configuration conf = new Configuration();

    FileSystem fs = FileSystem.get( URI.create( uri ), conf );
    fs.printStatistics();

However, it appears that hftp is read only, and I want to read/write as well
as copy files, that is, I want to connect over hdfs . How can I enable hdfs
connections so that i can edit the actual , remote filesystem using the file
/ path's APIs  ?  Are there ssh settings that have to be set before i can do
this > ?

I tried to change the protocol above from "hftp" -> "hdfs", but I got the
following exception ...

Exception in thread "main" java.io.IOException: Call to /
172.16.112.131:50070 failed on local exception: java.io.EOFException at
org.apache.hadoop.ipc.Client.wrapException(Client.java:1139) at
org.apache.hadoop.ipc.Client.call(Client.java:1107) at
org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226) at
$Proxy0.getProtocolVersion(Unknown Source) at
org.apache.hadoop.ipc.RPC.getProxy(RPC.java:398) at
org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384) at
org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:111) at
org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:213) at
org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:180) at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1514) at
org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67) at
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1548) at
org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1530) at
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228) at
sb.HadoopRemote.main(HadoopRemote.java:24)

Re: writing to hdfs via java api

Posted by JAX <ja...@gmail.com>.
Hi tom : which log will have info about why a process was Killed?

Sent from my iPad

On Oct 28, 2011, at 11:41 PM, Tom Melendez <to...@supertom.com> wrote:

> Hi Jay,
> 
> Are you able to look at the logs or the web interface?  Can you find
> out why it's getting killed?
> 
> Also, can you verify that these ports are open and a process is
> connected to them (maybe with netstat)?
> 
> http://www.cloudera.com/blog/2009/08/hadoop-default-ports-quick-reference/
> 
> Thanks,
> 
> Tom
> 
> On Fri, Oct 28, 2011 at 7:57 PM, Jay Vyas <ja...@gmail.com> wrote:
>> Thanks tom : Thats interesting....
>> 
>> First, I tried, and it complained that the input directory didnt exist, so I
>> ran
>> $> hadoop fs -mkdir /user/cloudera/input
>> 
>> Then, I tried to do this :
>> 
>> $> hadoop jar /usr/lib/hadoop-0.20/hadoop-examples.jar grep input output2
>> 'dfs[a-z.]+'
>> 
>> And it seemed to start working ...... But then it abruptly printed "killed"
>> somehow at the end of the job [scroll down] ?
>> 
>> Maybe this is related to why i cant connect ..... ?!
>> 
>> 1) the hadoop jar 11/10/14 21:34:43 WARN util.NativeCodeLoader: Unable to
>> load native-hadoop library for your platform... using builtin-java classes
>> where applicable
>> 11/10/14 21:34:43 WARN snappy.LoadSnappy: Snappy native library not loaded
>> 11/10/14 21:34:43 INFO mapred.FileInputFormat: Total input paths to process
>> : 0
>> 11/10/14 21:34:44 INFO mapred.JobClient: Running job: job_201110142010_0009
>> 11/10/14 21:34:45 INFO mapred.JobClient:  map 0% reduce 0%
>> 11/10/14 21:34:55 INFO mapred.JobClient:  map 0% reduce 100%
>> 11/10/14 21:34:57 INFO mapred.JobClient: Job complete: job_201110142010_0009
>> 11/10/14 21:34:57 INFO mapred.JobClient: Counters: 14
>> 11/10/14 21:34:57 INFO mapred.JobClient:   Job Counters
>> 11/10/14 21:34:57 INFO mapred.JobClient:     Launched reduce tasks=1
>> 11/10/14 21:34:57 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=5627
>> 11/10/14 21:34:57 INFO mapred.JobClient:     Total time spent by all reduces
>> waiting after reserving slots (ms)=0
>> 11/10/14 21:34:57 INFO mapred.JobClient:     Total time spent by all maps
>> waiting after reserving slots (ms)=0
>> 11/10/14 21:34:57 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=5050
>> 11/10/14 21:34:57 INFO mapred.JobClient:   FileSystemCounters
>> 11/10/14 21:34:57 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=53452
>> 11/10/14 21:34:57 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=86
>> 11/10/14 21:34:57 INFO mapred.JobClient:   Map-Reduce Framework
>> 11/10/14 21:34:57 INFO mapred.JobClient:     Reduce input groups=0
>> 11/10/14 21:34:57 INFO mapred.JobClient:     Combine output records=0
>> 11/10/14 21:34:57 INFO mapred.JobClient:     Reduce shuffle bytes=0
>> 11/10/14 21:34:57 INFO mapred.JobClient:     Reduce output records=0
>> 11/10/14 21:34:57 INFO mapred.JobClient:     Spilled Records=0
>> 11/10/14 21:34:57 INFO mapred.JobClient:     Combine input records=0
>> 11/10/14 21:34:57 INFO mapred.JobClient:     Reduce input records=0
>> 11/10/14 21:34:57 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 11/10/14 21:34:58 INFO mapred.FileInputFormat: Total input paths to process
>> : 1
>> 11/10/14 21:34:58 INFO mapred.JobClient: Running job: job_201110142010_0010
>> 11/10/14 21:34:59 INFO mapred.JobClient:  map 0% reduce 0%
>> Killed
>> 
>> 
>> On Fri, Oct 28, 2011 at 8:24 PM, Tom Melendez <to...@supertom.com> wrote:
>> 
>>> Hi Jay,
>>> 
>>> Some questions for you:
>>> 
>>> - Does the hadoop client itself work from that same machine?
>>> - Are you actually able to run the hadoop example jar (in other words,
>>> your setup is valid otherwise)?
>>> - Is port 8020 actually available?  (you can telnet or nc to it?)
>>> - What does jps show on the namenode?
>>> 
>>> Thanks,
>>> 
>>> Tom
>>> 
>>> On Fri, Oct 28, 2011 at 4:04 PM, Jay Vyas <ja...@gmail.com> wrote:
>>>> Hi guys : Made more progress debugging my hadoop connection, but still
>>>> haven't got it working......  It looks like my VM (cloudera hadoop) won't
>>>> let me in.  I find that there is no issue connecting to the name node -
>>> that
>>>> is , using hftp and 50070......
>>>> 
>>>> via standard HFTP as in here :
>>>> 
>>>> //This method works fine - connecting directly to hadoop's namenode and
>>>> querying the filesystem
>>>> public static void main1(String[] args) throws Exception
>>>>    {
>>>>        String uri = "hftp://155.37.101.76:50070/";
>>>> 
>>>>        System.out.println( "uri: " + uri );
>>>>        Configuration conf = new Configuration();
>>>> 
>>>>        FileSystem fs = FileSystem.get( URI.create( uri ), conf );
>>>>        fs.printStatistics();
>>>>    }
>>>> 
>>>> 
>>>> But unfortunately, I can't get into hdfs ..... Any thoughts on this ?  I
>>> am
>>>> modifying the uri to access port 8020
>>>> which is what is in my core-site.xml .
>>>> 
>>>>   // This fails, resulting (trys to connect over and over again,
>>> eventually
>>>> gives up printing "already tried to connect 20 times"....)
>>>>    public static void main(String[] args)
>>>>    {
>>>>        try {
>>>>            String uri = "hdfs://155.37.101.76:8020/";
>>>> 
>>>>            System.out.println( "uri: " + uri );
>>>>            Configuration conf = new Configuration();
>>>> 
>>>>            FileSystem fs = FileSystem.get( URI.create( uri ), conf );
>>>>            fs.printStatistics();
>>>>        } catch (Exception e) {
>>>>            // TODO Auto-generated catch block
>>>>            e.printStackTrace();
>>>>        }
>>>>    }
>>>> 
>>>> The error message is :
>>>> 
>>>> 11/10/28 19:03:38 INFO ipc.Client: Retrying connect to server: /
>>>> 155.37.101.76:8020. Already tried 0 time(s).
>>>> 11/10/28 19:03:39 INFO ipc.Client: Retrying connect to server: /
>>>> 155.37.101.76:8020. Already tried 1 time(s).
>>>> 11/10/28 19:03:40 INFO ipc.Client: Retrying connect to server: /
>>>> 155.37.101.76:8020. Already tried 2 time(s).
>>>> 11/10/28 19:03:41 INFO ipc.Client: Retrying connect to server: /
>>>> 155.37.101.76:8020. Already tried 3 time(s).
>>>> 
>>>> Any thoughts on this would be *really* be appreciated  ... Thanks guys.
>>>> 
>>> 
>> 
>> 
>> 
>> --
>> Jay Vyas
>> MMSB/UCHC
>> 

Re: writing to hdfs via java api

Posted by Alex Gauthier <al...@gmail.com>.
Touché my friend... if only  I could only....  :)

On Fri, Oct 28, 2011 at 9:16 PM, JAX <ja...@gmail.com> wrote:

> Yup.... Brutal :-|
> but you never regret fixing a bug   ... Unlike -------
>
> Sent from my iPad
>
> On Oct 28, 2011, at 11:43 PM, Alex Gauthier <al...@gmail.com>
> wrote:
>
> > Brutal Friday night.  Coding < pussy.
> >
> > :)
> >
> > On Fri, Oct 28, 2011 at 8:43 PM, Alex Gauthier <alexgauthier24@gmail.com
> >wrote:
> >
> >>
> >>
> >> On Fri, Oct 28, 2011 at 8:41 PM, Tom Melendez <to...@supertom.com> wrote:
> >>
> >>> Hi Jay,
> >>>
> >>> Are you able to look at the logs or the web interface?  Can you find
> >>> out why it's getting killed?
> >>>
> >>> Also, can you verify that these ports are open and a process is
> >>> connected to them (maybe with netstat)?
> >>>
> >>>
> http://www.cloudera.com/blog/2009/08/hadoop-default-ports-quick-reference/
> >>>
> >>> Thanks,
> >>>
> >>> Tom
> >>>
> >>> On Fri, Oct 28, 2011 at 7:57 PM, Jay Vyas <ja...@gmail.com>
> wrote:
> >>>> Thanks tom : Thats interesting....
> >>>>
> >>>> First, I tried, and it complained that the input directory didnt
> exist,
> >>> so I
> >>>> ran
> >>>> $> hadoop fs -mkdir /user/cloudera/input
> >>>>
> >>>> Then, I tried to do this :
> >>>>
> >>>> $> hadoop jar /usr/lib/hadoop-0.20/hadoop-examples.jar grep input
> >>> output2
> >>>> 'dfs[a-z.]+'
> >>>>
> >>>> And it seemed to start working ...... But then it abruptly printed
> >>> "killed"
> >>>> somehow at the end of the job [scroll down] ?
> >>>>
> >>>> Maybe this is related to why i cant connect ..... ?!
> >>>>
> >>>> 1) the hadoop jar 11/10/14 21:34:43 WARN util.NativeCodeLoader: Unable
> >>> to
> >>>> load native-hadoop library for your platform... using builtin-java
> >>> classes
> >>>> where applicable
> >>>> 11/10/14 21:34:43 WARN snappy.LoadSnappy: Snappy native library not
> >>> loaded
> >>>> 11/10/14 21:34:43 INFO mapred.FileInputFormat: Total input paths to
> >>> process
> >>>> : 0
> >>>> 11/10/14 21:34:44 INFO mapred.JobClient: Running job:
> >>> job_201110142010_0009
> >>>> 11/10/14 21:34:45 INFO mapred.JobClient:  map 0% reduce 0%
> >>>> 11/10/14 21:34:55 INFO mapred.JobClient:  map 0% reduce 100%
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Job complete:
> >>> job_201110142010_0009
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient: Counters: 14
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient:   Job Counters
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient:     Launched reduce tasks=1
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=5627
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient:     Total time spent by all
> >>> reduces
> >>>> waiting after reserving slots (ms)=0
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient:     Total time spent by all
> >>> maps
> >>>> waiting after reserving slots (ms)=0
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=5050
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient:   FileSystemCounters
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=53452
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=86
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient:   Map-Reduce Framework
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient:     Reduce input groups=0
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient:     Combine output records=0
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient:     Reduce shuffle bytes=0
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient:     Reduce output records=0
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient:     Spilled Records=0
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient:     Combine input records=0
> >>>> 11/10/14 21:34:57 INFO mapred.JobClient:     Reduce input records=0
> >>>> 11/10/14 21:34:57 WARN mapred.JobClient: Use GenericOptionsParser for
> >>>> parsing the arguments. Applications should implement Tool for the
> same.
> >>>> 11/10/14 21:34:58 INFO mapred.FileInputFormat: Total input paths to
> >>> process
> >>>> : 1
> >>>> 11/10/14 21:34:58 INFO mapred.JobClient: Running job:
> >>> job_201110142010_0010
> >>>> 11/10/14 21:34:59 INFO mapred.JobClient:  map 0% reduce 0%
> >>>> Killed
> >>>>
> >>>>
> >>>> On Fri, Oct 28, 2011 at 8:24 PM, Tom Melendez <to...@supertom.com>
> wrote:
> >>>>
> >>>>> Hi Jay,
> >>>>>
> >>>>> Some questions for you:
> >>>>>
> >>>>> - Does the hadoop client itself work from that same machine?
> >>>>> - Are you actually able to run the hadoop example jar (in other
> words,
> >>>>> your setup is valid otherwise)?
> >>>>> - Is port 8020 actually available?  (you can telnet or nc to it?)
> >>>>> - What does jps show on the namenode?
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Tom
> >>>>>
> >>>>> On Fri, Oct 28, 2011 at 4:04 PM, Jay Vyas <ja...@gmail.com>
> >>> wrote:
> >>>>>> Hi guys : Made more progress debugging my hadoop connection, but
> >>> still
> >>>>>> haven't got it working......  It looks like my VM (cloudera hadoop)
> >>> won't
> >>>>>> let me in.  I find that there is no issue connecting to the name
> node
> >>> -
> >>>>> that
> >>>>>> is , using hftp and 50070......
> >>>>>>
> >>>>>> via standard HFTP as in here :
> >>>>>>
> >>>>>> //This method works fine - connecting directly to hadoop's namenode
> >>> and
> >>>>>> querying the filesystem
> >>>>>> public static void main1(String[] args) throws Exception
> >>>>>>   {
> >>>>>>       String uri = "hftp://155.37.101.76:50070/";
> >>>>>>
> >>>>>>       System.out.println( "uri: " + uri );
> >>>>>>       Configuration conf = new Configuration();
> >>>>>>
> >>>>>>       FileSystem fs = FileSystem.get( URI.create( uri ), conf );
> >>>>>>       fs.printStatistics();
> >>>>>>   }
> >>>>>>
> >>>>>>
> >>>>>> But unfortunately, I can't get into hdfs ..... Any thoughts on this
> ?
> >>> I
> >>>>> am
> >>>>>> modifying the uri to access port 8020
> >>>>>> which is what is in my core-site.xml .
> >>>>>>
> >>>>>>  // This fails, resulting (trys to connect over and over again,
> >>>>> eventually
> >>>>>> gives up printing "already tried to connect 20 times"....)
> >>>>>>   public static void main(String[] args)
> >>>>>>   {
> >>>>>>       try {
> >>>>>>           String uri = "hdfs://155.37.101.76:8020/";
> >>>>>>
> >>>>>>           System.out.println( "uri: " + uri );
> >>>>>>           Configuration conf = new Configuration();
> >>>>>>
> >>>>>>           FileSystem fs = FileSystem.get( URI.create( uri ), conf );
> >>>>>>           fs.printStatistics();
> >>>>>>       } catch (Exception e) {
> >>>>>>           // TODO Auto-generated catch block
> >>>>>>           e.printStackTrace();
> >>>>>>       }
> >>>>>>   }
> >>>>>>
> >>>>>> The error message is :
> >>>>>>
> >>>>>> 11/10/28 19:03:38 INFO ipc.Client: Retrying connect to server: /
> >>>>>> 155.37.101.76:8020. Already tried 0 time(s).
> >>>>>> 11/10/28 19:03:39 INFO ipc.Client: Retrying connect to server: /
> >>>>>> 155.37.101.76:8020. Already tried 1 time(s).
> >>>>>> 11/10/28 19:03:40 INFO ipc.Client: Retrying connect to server: /
> >>>>>> 155.37.101.76:8020. Already tried 2 time(s).
> >>>>>> 11/10/28 19:03:41 INFO ipc.Client: Retrying connect to server: /
> >>>>>> 155.37.101.76:8020. Already tried 3 time(s).
> >>>>>>
> >>>>>> Any thoughts on this would be *really* be appreciated  ... Thanks
> >>> guys.
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Jay Vyas
> >>>> MMSB/UCHC
> >>>>
> >>>
> >>
> >>
>

Re: writing to hdfs via java api

Posted by JAX <ja...@gmail.com>.
Yup.... Brutal :-|
but you never regret fixing a bug   ... Unlike -------

Sent from my iPad

On Oct 28, 2011, at 11:43 PM, Alex Gauthier <al...@gmail.com> wrote:

> Brutal Friday night.  Coding < pussy.
> 
> :)
> 
> On Fri, Oct 28, 2011 at 8:43 PM, Alex Gauthier <al...@gmail.com>wrote:
> 
>> 
>> 
>> On Fri, Oct 28, 2011 at 8:41 PM, Tom Melendez <to...@supertom.com> wrote:
>> 
>>> Hi Jay,
>>> 
>>> Are you able to look at the logs or the web interface?  Can you find
>>> out why it's getting killed?
>>> 
>>> Also, can you verify that these ports are open and a process is
>>> connected to them (maybe with netstat)?
>>> 
>>> http://www.cloudera.com/blog/2009/08/hadoop-default-ports-quick-reference/
>>> 
>>> Thanks,
>>> 
>>> Tom
>>> 
>>> On Fri, Oct 28, 2011 at 7:57 PM, Jay Vyas <ja...@gmail.com> wrote:
>>>> Thanks tom : Thats interesting....
>>>> 
>>>> First, I tried, and it complained that the input directory didnt exist,
>>> so I
>>>> ran
>>>> $> hadoop fs -mkdir /user/cloudera/input
>>>> 
>>>> Then, I tried to do this :
>>>> 
>>>> $> hadoop jar /usr/lib/hadoop-0.20/hadoop-examples.jar grep input
>>> output2
>>>> 'dfs[a-z.]+'
>>>> 
>>>> And it seemed to start working ...... But then it abruptly printed
>>> "killed"
>>>> somehow at the end of the job [scroll down] ?
>>>> 
>>>> Maybe this is related to why i cant connect ..... ?!
>>>> 
>>>> 1) the hadoop jar 11/10/14 21:34:43 WARN util.NativeCodeLoader: Unable
>>> to
>>>> load native-hadoop library for your platform... using builtin-java
>>> classes
>>>> where applicable
>>>> 11/10/14 21:34:43 WARN snappy.LoadSnappy: Snappy native library not
>>> loaded
>>>> 11/10/14 21:34:43 INFO mapred.FileInputFormat: Total input paths to
>>> process
>>>> : 0
>>>> 11/10/14 21:34:44 INFO mapred.JobClient: Running job:
>>> job_201110142010_0009
>>>> 11/10/14 21:34:45 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 11/10/14 21:34:55 INFO mapred.JobClient:  map 0% reduce 100%
>>>> 11/10/14 21:34:57 INFO mapred.JobClient: Job complete:
>>> job_201110142010_0009
>>>> 11/10/14 21:34:57 INFO mapred.JobClient: Counters: 14
>>>> 11/10/14 21:34:57 INFO mapred.JobClient:   Job Counters
>>>> 11/10/14 21:34:57 INFO mapred.JobClient:     Launched reduce tasks=1
>>>> 11/10/14 21:34:57 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=5627
>>>> 11/10/14 21:34:57 INFO mapred.JobClient:     Total time spent by all
>>> reduces
>>>> waiting after reserving slots (ms)=0
>>>> 11/10/14 21:34:57 INFO mapred.JobClient:     Total time spent by all
>>> maps
>>>> waiting after reserving slots (ms)=0
>>>> 11/10/14 21:34:57 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=5050
>>>> 11/10/14 21:34:57 INFO mapred.JobClient:   FileSystemCounters
>>>> 11/10/14 21:34:57 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=53452
>>>> 11/10/14 21:34:57 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=86
>>>> 11/10/14 21:34:57 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 11/10/14 21:34:57 INFO mapred.JobClient:     Reduce input groups=0
>>>> 11/10/14 21:34:57 INFO mapred.JobClient:     Combine output records=0
>>>> 11/10/14 21:34:57 INFO mapred.JobClient:     Reduce shuffle bytes=0
>>>> 11/10/14 21:34:57 INFO mapred.JobClient:     Reduce output records=0
>>>> 11/10/14 21:34:57 INFO mapred.JobClient:     Spilled Records=0
>>>> 11/10/14 21:34:57 INFO mapred.JobClient:     Combine input records=0
>>>> 11/10/14 21:34:57 INFO mapred.JobClient:     Reduce input records=0
>>>> 11/10/14 21:34:57 WARN mapred.JobClient: Use GenericOptionsParser for
>>>> parsing the arguments. Applications should implement Tool for the same.
>>>> 11/10/14 21:34:58 INFO mapred.FileInputFormat: Total input paths to
>>> process
>>>> : 1
>>>> 11/10/14 21:34:58 INFO mapred.JobClient: Running job:
>>> job_201110142010_0010
>>>> 11/10/14 21:34:59 INFO mapred.JobClient:  map 0% reduce 0%
>>>> Killed
>>>> 
>>>> 
>>>> On Fri, Oct 28, 2011 at 8:24 PM, Tom Melendez <to...@supertom.com> wrote:
>>>> 
>>>>> Hi Jay,
>>>>> 
>>>>> Some questions for you:
>>>>> 
>>>>> - Does the hadoop client itself work from that same machine?
>>>>> - Are you actually able to run the hadoop example jar (in other words,
>>>>> your setup is valid otherwise)?
>>>>> - Is port 8020 actually available?  (you can telnet or nc to it?)
>>>>> - What does jps show on the namenode?
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Tom
>>>>> 
>>>>> On Fri, Oct 28, 2011 at 4:04 PM, Jay Vyas <ja...@gmail.com>
>>> wrote:
>>>>>> Hi guys : Made more progress debugging my hadoop connection, but
>>> still
>>>>>> haven't got it working......  It looks like my VM (cloudera hadoop)
>>> won't
>>>>>> let me in.  I find that there is no issue connecting to the name node
>>> -
>>>>> that
>>>>>> is , using hftp and 50070......
>>>>>> 
>>>>>> via standard HFTP as in here :
>>>>>> 
>>>>>> //This method works fine - connecting directly to hadoop's namenode
>>> and
>>>>>> querying the filesystem
>>>>>> public static void main1(String[] args) throws Exception
>>>>>>   {
>>>>>>       String uri = "hftp://155.37.101.76:50070/";
>>>>>> 
>>>>>>       System.out.println( "uri: " + uri );
>>>>>>       Configuration conf = new Configuration();
>>>>>> 
>>>>>>       FileSystem fs = FileSystem.get( URI.create( uri ), conf );
>>>>>>       fs.printStatistics();
>>>>>>   }
>>>>>> 
>>>>>> 
>>>>>> But unfortunately, I can't get into hdfs ..... Any thoughts on this ?
>>> I
>>>>> am
>>>>>> modifying the uri to access port 8020
>>>>>> which is what is in my core-site.xml .
>>>>>> 
>>>>>>  // This fails, resulting (trys to connect over and over again,
>>>>> eventually
>>>>>> gives up printing "already tried to connect 20 times"....)
>>>>>>   public static void main(String[] args)
>>>>>>   {
>>>>>>       try {
>>>>>>           String uri = "hdfs://155.37.101.76:8020/";
>>>>>> 
>>>>>>           System.out.println( "uri: " + uri );
>>>>>>           Configuration conf = new Configuration();
>>>>>> 
>>>>>>           FileSystem fs = FileSystem.get( URI.create( uri ), conf );
>>>>>>           fs.printStatistics();
>>>>>>       } catch (Exception e) {
>>>>>>           // TODO Auto-generated catch block
>>>>>>           e.printStackTrace();
>>>>>>       }
>>>>>>   }
>>>>>> 
>>>>>> The error message is :
>>>>>> 
>>>>>> 11/10/28 19:03:38 INFO ipc.Client: Retrying connect to server: /
>>>>>> 155.37.101.76:8020. Already tried 0 time(s).
>>>>>> 11/10/28 19:03:39 INFO ipc.Client: Retrying connect to server: /
>>>>>> 155.37.101.76:8020. Already tried 1 time(s).
>>>>>> 11/10/28 19:03:40 INFO ipc.Client: Retrying connect to server: /
>>>>>> 155.37.101.76:8020. Already tried 2 time(s).
>>>>>> 11/10/28 19:03:41 INFO ipc.Client: Retrying connect to server: /
>>>>>> 155.37.101.76:8020. Already tried 3 time(s).
>>>>>> 
>>>>>> Any thoughts on this would be *really* be appreciated  ... Thanks
>>> guys.
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Jay Vyas
>>>> MMSB/UCHC
>>>> 
>>> 
>> 
>> 

Re: writing to hdfs via java api

Posted by Alex Gauthier <al...@gmail.com>.
Brutal Friday night.  Coding < pussy.

:)

On Fri, Oct 28, 2011 at 8:43 PM, Alex Gauthier <al...@gmail.com>wrote:

>
>
> On Fri, Oct 28, 2011 at 8:41 PM, Tom Melendez <to...@supertom.com> wrote:
>
>> Hi Jay,
>>
>> Are you able to look at the logs or the web interface?  Can you find
>> out why it's getting killed?
>>
>> Also, can you verify that these ports are open and a process is
>> connected to them (maybe with netstat)?
>>
>> http://www.cloudera.com/blog/2009/08/hadoop-default-ports-quick-reference/
>>
>> Thanks,
>>
>> Tom
>>
>> On Fri, Oct 28, 2011 at 7:57 PM, Jay Vyas <ja...@gmail.com> wrote:
>> > Thanks tom : Thats interesting....
>> >
>> > First, I tried, and it complained that the input directory didnt exist,
>> so I
>> > ran
>> > $> hadoop fs -mkdir /user/cloudera/input
>> >
>> > Then, I tried to do this :
>> >
>> > $> hadoop jar /usr/lib/hadoop-0.20/hadoop-examples.jar grep input
>> output2
>> > 'dfs[a-z.]+'
>> >
>> > And it seemed to start working ...... But then it abruptly printed
>> "killed"
>> > somehow at the end of the job [scroll down] ?
>> >
>> > Maybe this is related to why i cant connect ..... ?!
>> >
>> > 1) the hadoop jar 11/10/14 21:34:43 WARN util.NativeCodeLoader: Unable
>> to
>> > load native-hadoop library for your platform... using builtin-java
>> classes
>> > where applicable
>> > 11/10/14 21:34:43 WARN snappy.LoadSnappy: Snappy native library not
>> loaded
>> > 11/10/14 21:34:43 INFO mapred.FileInputFormat: Total input paths to
>> process
>> > : 0
>> > 11/10/14 21:34:44 INFO mapred.JobClient: Running job:
>> job_201110142010_0009
>> > 11/10/14 21:34:45 INFO mapred.JobClient:  map 0% reduce 0%
>> > 11/10/14 21:34:55 INFO mapred.JobClient:  map 0% reduce 100%
>> > 11/10/14 21:34:57 INFO mapred.JobClient: Job complete:
>> job_201110142010_0009
>> > 11/10/14 21:34:57 INFO mapred.JobClient: Counters: 14
>> > 11/10/14 21:34:57 INFO mapred.JobClient:   Job Counters
>> > 11/10/14 21:34:57 INFO mapred.JobClient:     Launched reduce tasks=1
>> > 11/10/14 21:34:57 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=5627
>> > 11/10/14 21:34:57 INFO mapred.JobClient:     Total time spent by all
>> reduces
>> > waiting after reserving slots (ms)=0
>> > 11/10/14 21:34:57 INFO mapred.JobClient:     Total time spent by all
>> maps
>> > waiting after reserving slots (ms)=0
>> > 11/10/14 21:34:57 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=5050
>> > 11/10/14 21:34:57 INFO mapred.JobClient:   FileSystemCounters
>> > 11/10/14 21:34:57 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=53452
>> > 11/10/14 21:34:57 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=86
>> > 11/10/14 21:34:57 INFO mapred.JobClient:   Map-Reduce Framework
>> > 11/10/14 21:34:57 INFO mapred.JobClient:     Reduce input groups=0
>> > 11/10/14 21:34:57 INFO mapred.JobClient:     Combine output records=0
>> > 11/10/14 21:34:57 INFO mapred.JobClient:     Reduce shuffle bytes=0
>> > 11/10/14 21:34:57 INFO mapred.JobClient:     Reduce output records=0
>> > 11/10/14 21:34:57 INFO mapred.JobClient:     Spilled Records=0
>> > 11/10/14 21:34:57 INFO mapred.JobClient:     Combine input records=0
>> > 11/10/14 21:34:57 INFO mapred.JobClient:     Reduce input records=0
>> > 11/10/14 21:34:57 WARN mapred.JobClient: Use GenericOptionsParser for
>> > parsing the arguments. Applications should implement Tool for the same.
>> > 11/10/14 21:34:58 INFO mapred.FileInputFormat: Total input paths to
>> process
>> > : 1
>> > 11/10/14 21:34:58 INFO mapred.JobClient: Running job:
>> job_201110142010_0010
>> > 11/10/14 21:34:59 INFO mapred.JobClient:  map 0% reduce 0%
>> > Killed
>> >
>> >
>> > On Fri, Oct 28, 2011 at 8:24 PM, Tom Melendez <to...@supertom.com> wrote:
>> >
>> >> Hi Jay,
>> >>
>> >> Some questions for you:
>> >>
>> >> - Does the hadoop client itself work from that same machine?
>> >> - Are you actually able to run the hadoop example jar (in other words,
>> >> your setup is valid otherwise)?
>> >> - Is port 8020 actually available?  (you can telnet or nc to it?)
>> >> - What does jps show on the namenode?
>> >>
>> >> Thanks,
>> >>
>> >> Tom
>> >>
>> >> On Fri, Oct 28, 2011 at 4:04 PM, Jay Vyas <ja...@gmail.com>
>> wrote:
>> >> > Hi guys : Made more progress debugging my hadoop connection, but
>> still
>> >> > haven't got it working......  It looks like my VM (cloudera hadoop)
>> won't
>> >> > let me in.  I find that there is no issue connecting to the name node
>> -
>> >> that
>> >> > is , using hftp and 50070......
>> >> >
>> >> > via standard HFTP as in here :
>> >> >
>> >> > //This method works fine - connecting directly to hadoop's namenode
>> and
>> >> > querying the filesystem
>> >> > public static void main1(String[] args) throws Exception
>> >> >    {
>> >> >        String uri = "hftp://155.37.101.76:50070/";
>> >> >
>> >> >        System.out.println( "uri: " + uri );
>> >> >        Configuration conf = new Configuration();
>> >> >
>> >> >        FileSystem fs = FileSystem.get( URI.create( uri ), conf );
>> >> >        fs.printStatistics();
>> >> >    }
>> >> >
>> >> >
>> >> > But unfortunately, I can't get into hdfs ..... Any thoughts on this ?
>>  I
>> >> am
>> >> > modifying the uri to access port 8020
>> >> > which is what is in my core-site.xml .
>> >> >
>> >> >   // This fails, resulting (trys to connect over and over again,
>> >> eventually
>> >> > gives up printing "already tried to connect 20 times"....)
>> >> >    public static void main(String[] args)
>> >> >    {
>> >> >        try {
>> >> >            String uri = "hdfs://155.37.101.76:8020/";
>> >> >
>> >> >            System.out.println( "uri: " + uri );
>> >> >            Configuration conf = new Configuration();
>> >> >
>> >> >            FileSystem fs = FileSystem.get( URI.create( uri ), conf );
>> >> >            fs.printStatistics();
>> >> >        } catch (Exception e) {
>> >> >            // TODO Auto-generated catch block
>> >> >            e.printStackTrace();
>> >> >        }
>> >> >    }
>> >> >
>> >> > The error message is :
>> >> >
>> >> > 11/10/28 19:03:38 INFO ipc.Client: Retrying connect to server: /
>> >> > 155.37.101.76:8020. Already tried 0 time(s).
>> >> > 11/10/28 19:03:39 INFO ipc.Client: Retrying connect to server: /
>> >> > 155.37.101.76:8020. Already tried 1 time(s).
>> >> > 11/10/28 19:03:40 INFO ipc.Client: Retrying connect to server: /
>> >> > 155.37.101.76:8020. Already tried 2 time(s).
>> >> > 11/10/28 19:03:41 INFO ipc.Client: Retrying connect to server: /
>> >> > 155.37.101.76:8020. Already tried 3 time(s).
>> >> >
>> >> > Any thoughts on this would be *really* be appreciated  ... Thanks
>> guys.
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> > Jay Vyas
>> > MMSB/UCHC
>> >
>>
>
>

Re: writing to hdfs via java api

Posted by Alex Gauthier <al...@gmail.com>.
On Fri, Oct 28, 2011 at 8:41 PM, Tom Melendez <to...@supertom.com> wrote:

> Hi Jay,
>
> Are you able to look at the logs or the web interface?  Can you find
> out why it's getting killed?
>
> Also, can you verify that these ports are open and a process is
> connected to them (maybe with netstat)?
>
> http://www.cloudera.com/blog/2009/08/hadoop-default-ports-quick-reference/
>
> Thanks,
>
> Tom
>
> On Fri, Oct 28, 2011 at 7:57 PM, Jay Vyas <ja...@gmail.com> wrote:
> > Thanks tom : Thats interesting....
> >
> > First, I tried, and it complained that the input directory didnt exist,
> so I
> > ran
> > $> hadoop fs -mkdir /user/cloudera/input
> >
> > Then, I tried to do this :
> >
> > $> hadoop jar /usr/lib/hadoop-0.20/hadoop-examples.jar grep input output2
> > 'dfs[a-z.]+'
> >
> > And it seemed to start working ...... But then it abruptly printed
> "killed"
> > somehow at the end of the job [scroll down] ?
> >
> > Maybe this is related to why i cant connect ..... ?!
> >
> > 1) the hadoop jar 11/10/14 21:34:43 WARN util.NativeCodeLoader: Unable to
> > load native-hadoop library for your platform... using builtin-java
> classes
> > where applicable
> > 11/10/14 21:34:43 WARN snappy.LoadSnappy: Snappy native library not
> loaded
> > 11/10/14 21:34:43 INFO mapred.FileInputFormat: Total input paths to
> process
> > : 0
> > 11/10/14 21:34:44 INFO mapred.JobClient: Running job:
> job_201110142010_0009
> > 11/10/14 21:34:45 INFO mapred.JobClient:  map 0% reduce 0%
> > 11/10/14 21:34:55 INFO mapred.JobClient:  map 0% reduce 100%
> > 11/10/14 21:34:57 INFO mapred.JobClient: Job complete:
> job_201110142010_0009
> > 11/10/14 21:34:57 INFO mapred.JobClient: Counters: 14
> > 11/10/14 21:34:57 INFO mapred.JobClient:   Job Counters
> > 11/10/14 21:34:57 INFO mapred.JobClient:     Launched reduce tasks=1
> > 11/10/14 21:34:57 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=5627
> > 11/10/14 21:34:57 INFO mapred.JobClient:     Total time spent by all
> reduces
> > waiting after reserving slots (ms)=0
> > 11/10/14 21:34:57 INFO mapred.JobClient:     Total time spent by all maps
> > waiting after reserving slots (ms)=0
> > 11/10/14 21:34:57 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=5050
> > 11/10/14 21:34:57 INFO mapred.JobClient:   FileSystemCounters
> > 11/10/14 21:34:57 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=53452
> > 11/10/14 21:34:57 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=86
> > 11/10/14 21:34:57 INFO mapred.JobClient:   Map-Reduce Framework
> > 11/10/14 21:34:57 INFO mapred.JobClient:     Reduce input groups=0
> > 11/10/14 21:34:57 INFO mapred.JobClient:     Combine output records=0
> > 11/10/14 21:34:57 INFO mapred.JobClient:     Reduce shuffle bytes=0
> > 11/10/14 21:34:57 INFO mapred.JobClient:     Reduce output records=0
> > 11/10/14 21:34:57 INFO mapred.JobClient:     Spilled Records=0
> > 11/10/14 21:34:57 INFO mapred.JobClient:     Combine input records=0
> > 11/10/14 21:34:57 INFO mapred.JobClient:     Reduce input records=0
> > 11/10/14 21:34:57 WARN mapred.JobClient: Use GenericOptionsParser for
> > parsing the arguments. Applications should implement Tool for the same.
> > 11/10/14 21:34:58 INFO mapred.FileInputFormat: Total input paths to
> process
> > : 1
> > 11/10/14 21:34:58 INFO mapred.JobClient: Running job:
> job_201110142010_0010
> > 11/10/14 21:34:59 INFO mapred.JobClient:  map 0% reduce 0%
> > Killed
> >
> >
> > On Fri, Oct 28, 2011 at 8:24 PM, Tom Melendez <to...@supertom.com> wrote:
> >
> >> Hi Jay,
> >>
> >> Some questions for you:
> >>
> >> - Does the hadoop client itself work from that same machine?
> >> - Are you actually able to run the hadoop example jar (in other words,
> >> your setup is valid otherwise)?
> >> - Is port 8020 actually available?  (you can telnet or nc to it?)
> >> - What does jps show on the namenode?
> >>
> >> Thanks,
> >>
> >> Tom
> >>
> >> On Fri, Oct 28, 2011 at 4:04 PM, Jay Vyas <ja...@gmail.com> wrote:
> >> > Hi guys : Made more progress debugging my hadoop connection, but still
> >> > haven't got it working......  It looks like my VM (cloudera hadoop)
> won't
> >> > let me in.  I find that there is no issue connecting to the name node
> -
> >> that
> >> > is , using hftp and 50070......
> >> >
> >> > via standard HFTP as in here :
> >> >
> >> > //This method works fine - connecting directly to hadoop's namenode
> and
> >> > querying the filesystem
> >> > public static void main1(String[] args) throws Exception
> >> >    {
> >> >        String uri = "hftp://155.37.101.76:50070/";
> >> >
> >> >        System.out.println( "uri: " + uri );
> >> >        Configuration conf = new Configuration();
> >> >
> >> >        FileSystem fs = FileSystem.get( URI.create( uri ), conf );
> >> >        fs.printStatistics();
> >> >    }
> >> >
> >> >
> >> > But unfortunately, I can't get into hdfs ..... Any thoughts on this ?
>  I
> >> am
> >> > modifying the uri to access port 8020
> >> > which is what is in my core-site.xml .
> >> >
> >> >   // This fails, resulting (trys to connect over and over again,
> >> eventually
> >> > gives up printing "already tried to connect 20 times"....)
> >> >    public static void main(String[] args)
> >> >    {
> >> >        try {
> >> >            String uri = "hdfs://155.37.101.76:8020/";
> >> >
> >> >            System.out.println( "uri: " + uri );
> >> >            Configuration conf = new Configuration();
> >> >
> >> >            FileSystem fs = FileSystem.get( URI.create( uri ), conf );
> >> >            fs.printStatistics();
> >> >        } catch (Exception e) {
> >> >            // TODO Auto-generated catch block
> >> >            e.printStackTrace();
> >> >        }
> >> >    }
> >> >
> >> > The error message is :
> >> >
> >> > 11/10/28 19:03:38 INFO ipc.Client: Retrying connect to server: /
> >> > 155.37.101.76:8020. Already tried 0 time(s).
> >> > 11/10/28 19:03:39 INFO ipc.Client: Retrying connect to server: /
> >> > 155.37.101.76:8020. Already tried 1 time(s).
> >> > 11/10/28 19:03:40 INFO ipc.Client: Retrying connect to server: /
> >> > 155.37.101.76:8020. Already tried 2 time(s).
> >> > 11/10/28 19:03:41 INFO ipc.Client: Retrying connect to server: /
> >> > 155.37.101.76:8020. Already tried 3 time(s).
> >> >
> >> > Any thoughts on this would be *really* be appreciated  ... Thanks
> guys.
> >> >
> >>
> >
> >
> >
> > --
> > Jay Vyas
> > MMSB/UCHC
> >
>

Re: writing to hdfs via java api

Posted by Tom Melendez <to...@supertom.com>.
Hi Jay,

Are you able to look at the logs or the web interface?  Can you find
out why it's getting killed?

Also, can you verify that these ports are open and a process is
connected to them (maybe with netstat)?

http://www.cloudera.com/blog/2009/08/hadoop-default-ports-quick-reference/

Thanks,

Tom

On Fri, Oct 28, 2011 at 7:57 PM, Jay Vyas <ja...@gmail.com> wrote:
> Thanks tom : Thats interesting....
>
> First, I tried, and it complained that the input directory didnt exist, so I
> ran
> $> hadoop fs -mkdir /user/cloudera/input
>
> Then, I tried to do this :
>
> $> hadoop jar /usr/lib/hadoop-0.20/hadoop-examples.jar grep input output2
> 'dfs[a-z.]+'
>
> And it seemed to start working ...... But then it abruptly printed "killed"
> somehow at the end of the job [scroll down] ?
>
> Maybe this is related to why i cant connect ..... ?!
>
> 1) the hadoop jar 11/10/14 21:34:43 WARN util.NativeCodeLoader: Unable to
> load native-hadoop library for your platform... using builtin-java classes
> where applicable
> 11/10/14 21:34:43 WARN snappy.LoadSnappy: Snappy native library not loaded
> 11/10/14 21:34:43 INFO mapred.FileInputFormat: Total input paths to process
> : 0
> 11/10/14 21:34:44 INFO mapred.JobClient: Running job: job_201110142010_0009
> 11/10/14 21:34:45 INFO mapred.JobClient:  map 0% reduce 0%
> 11/10/14 21:34:55 INFO mapred.JobClient:  map 0% reduce 100%
> 11/10/14 21:34:57 INFO mapred.JobClient: Job complete: job_201110142010_0009
> 11/10/14 21:34:57 INFO mapred.JobClient: Counters: 14
> 11/10/14 21:34:57 INFO mapred.JobClient:   Job Counters
> 11/10/14 21:34:57 INFO mapred.JobClient:     Launched reduce tasks=1
> 11/10/14 21:34:57 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=5627
> 11/10/14 21:34:57 INFO mapred.JobClient:     Total time spent by all reduces
> waiting after reserving slots (ms)=0
> 11/10/14 21:34:57 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
> 11/10/14 21:34:57 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=5050
> 11/10/14 21:34:57 INFO mapred.JobClient:   FileSystemCounters
> 11/10/14 21:34:57 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=53452
> 11/10/14 21:34:57 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=86
> 11/10/14 21:34:57 INFO mapred.JobClient:   Map-Reduce Framework
> 11/10/14 21:34:57 INFO mapred.JobClient:     Reduce input groups=0
> 11/10/14 21:34:57 INFO mapred.JobClient:     Combine output records=0
> 11/10/14 21:34:57 INFO mapred.JobClient:     Reduce shuffle bytes=0
> 11/10/14 21:34:57 INFO mapred.JobClient:     Reduce output records=0
> 11/10/14 21:34:57 INFO mapred.JobClient:     Spilled Records=0
> 11/10/14 21:34:57 INFO mapred.JobClient:     Combine input records=0
> 11/10/14 21:34:57 INFO mapred.JobClient:     Reduce input records=0
> 11/10/14 21:34:57 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 11/10/14 21:34:58 INFO mapred.FileInputFormat: Total input paths to process
> : 1
> 11/10/14 21:34:58 INFO mapred.JobClient: Running job: job_201110142010_0010
> 11/10/14 21:34:59 INFO mapred.JobClient:  map 0% reduce 0%
> Killed
>
>
> On Fri, Oct 28, 2011 at 8:24 PM, Tom Melendez <to...@supertom.com> wrote:
>
>> Hi Jay,
>>
>> Some questions for you:
>>
>> - Does the hadoop client itself work from that same machine?
>> - Are you actually able to run the hadoop example jar (in other words,
>> your setup is valid otherwise)?
>> - Is port 8020 actually available?  (you can telnet or nc to it?)
>> - What does jps show on the namenode?
>>
>> Thanks,
>>
>> Tom
>>
>> On Fri, Oct 28, 2011 at 4:04 PM, Jay Vyas <ja...@gmail.com> wrote:
>> > Hi guys : Made more progress debugging my hadoop connection, but still
>> > haven't got it working......  It looks like my VM (cloudera hadoop) won't
>> > let me in.  I find that there is no issue connecting to the name node -
>> that
>> > is , using hftp and 50070......
>> >
>> > via standard HFTP as in here :
>> >
>> > //This method works fine - connecting directly to hadoop's namenode and
>> > querying the filesystem
>> > public static void main1(String[] args) throws Exception
>> >    {
>> >        String uri = "hftp://155.37.101.76:50070/";
>> >
>> >        System.out.println( "uri: " + uri );
>> >        Configuration conf = new Configuration();
>> >
>> >        FileSystem fs = FileSystem.get( URI.create( uri ), conf );
>> >        fs.printStatistics();
>> >    }
>> >
>> >
>> > But unfortunately, I can't get into hdfs ..... Any thoughts on this ?  I
>> am
>> > modifying the uri to access port 8020
>> > which is what is in my core-site.xml .
>> >
>> >   // This fails, resulting (trys to connect over and over again,
>> eventually
>> > gives up printing "already tried to connect 20 times"....)
>> >    public static void main(String[] args)
>> >    {
>> >        try {
>> >            String uri = "hdfs://155.37.101.76:8020/";
>> >
>> >            System.out.println( "uri: " + uri );
>> >            Configuration conf = new Configuration();
>> >
>> >            FileSystem fs = FileSystem.get( URI.create( uri ), conf );
>> >            fs.printStatistics();
>> >        } catch (Exception e) {
>> >            // TODO Auto-generated catch block
>> >            e.printStackTrace();
>> >        }
>> >    }
>> >
>> > The error message is :
>> >
>> > 11/10/28 19:03:38 INFO ipc.Client: Retrying connect to server: /
>> > 155.37.101.76:8020. Already tried 0 time(s).
>> > 11/10/28 19:03:39 INFO ipc.Client: Retrying connect to server: /
>> > 155.37.101.76:8020. Already tried 1 time(s).
>> > 11/10/28 19:03:40 INFO ipc.Client: Retrying connect to server: /
>> > 155.37.101.76:8020. Already tried 2 time(s).
>> > 11/10/28 19:03:41 INFO ipc.Client: Retrying connect to server: /
>> > 155.37.101.76:8020. Already tried 3 time(s).
>> >
>> > Any thoughts on this would be *really* be appreciated  ... Thanks guys.
>> >
>>
>
>
>
> --
> Jay Vyas
> MMSB/UCHC
>

Re: writing to hdfs via java api

Posted by Jay Vyas <ja...@gmail.com>.
Thanks tom : Thats interesting....

First, I tried, and it complained that the input directory didnt exist, so I
ran
$> hadoop fs -mkdir /user/cloudera/input

Then, I tried to do this :

$> hadoop jar /usr/lib/hadoop-0.20/hadoop-examples.jar grep input output2
'dfs[a-z.]+'

And it seemed to start working ...... But then it abruptly printed "killed"
somehow at the end of the job [scroll down] ?

Maybe this is related to why i cant connect ..... ?!

1) the hadoop jar 11/10/14 21:34:43 WARN util.NativeCodeLoader: Unable to
load native-hadoop library for your platform... using builtin-java classes
where applicable
11/10/14 21:34:43 WARN snappy.LoadSnappy: Snappy native library not loaded
11/10/14 21:34:43 INFO mapred.FileInputFormat: Total input paths to process
: 0
11/10/14 21:34:44 INFO mapred.JobClient: Running job: job_201110142010_0009
11/10/14 21:34:45 INFO mapred.JobClient:  map 0% reduce 0%
11/10/14 21:34:55 INFO mapred.JobClient:  map 0% reduce 100%
11/10/14 21:34:57 INFO mapred.JobClient: Job complete: job_201110142010_0009
11/10/14 21:34:57 INFO mapred.JobClient: Counters: 14
11/10/14 21:34:57 INFO mapred.JobClient:   Job Counters
11/10/14 21:34:57 INFO mapred.JobClient:     Launched reduce tasks=1
11/10/14 21:34:57 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=5627
11/10/14 21:34:57 INFO mapred.JobClient:     Total time spent by all reduces
waiting after reserving slots (ms)=0
11/10/14 21:34:57 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
11/10/14 21:34:57 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=5050
11/10/14 21:34:57 INFO mapred.JobClient:   FileSystemCounters
11/10/14 21:34:57 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=53452
11/10/14 21:34:57 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=86
11/10/14 21:34:57 INFO mapred.JobClient:   Map-Reduce Framework
11/10/14 21:34:57 INFO mapred.JobClient:     Reduce input groups=0
11/10/14 21:34:57 INFO mapred.JobClient:     Combine output records=0
11/10/14 21:34:57 INFO mapred.JobClient:     Reduce shuffle bytes=0
11/10/14 21:34:57 INFO mapred.JobClient:     Reduce output records=0
11/10/14 21:34:57 INFO mapred.JobClient:     Spilled Records=0
11/10/14 21:34:57 INFO mapred.JobClient:     Combine input records=0
11/10/14 21:34:57 INFO mapred.JobClient:     Reduce input records=0
11/10/14 21:34:57 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
11/10/14 21:34:58 INFO mapred.FileInputFormat: Total input paths to process
: 1
11/10/14 21:34:58 INFO mapred.JobClient: Running job: job_201110142010_0010
11/10/14 21:34:59 INFO mapred.JobClient:  map 0% reduce 0%
Killed


On Fri, Oct 28, 2011 at 8:24 PM, Tom Melendez <to...@supertom.com> wrote:

> Hi Jay,
>
> Some questions for you:
>
> - Does the hadoop client itself work from that same machine?
> - Are you actually able to run the hadoop example jar (in other words,
> your setup is valid otherwise)?
> - Is port 8020 actually available?  (you can telnet or nc to it?)
> - What does jps show on the namenode?
>
> Thanks,
>
> Tom
>
> On Fri, Oct 28, 2011 at 4:04 PM, Jay Vyas <ja...@gmail.com> wrote:
> > Hi guys : Made more progress debugging my hadoop connection, but still
> > haven't got it working......  It looks like my VM (cloudera hadoop) won't
> > let me in.  I find that there is no issue connecting to the name node -
> that
> > is , using hftp and 50070......
> >
> > via standard HFTP as in here :
> >
> > //This method works fine - connecting directly to hadoop's namenode and
> > querying the filesystem
> > public static void main1(String[] args) throws Exception
> >    {
> >        String uri = "hftp://155.37.101.76:50070/";
> >
> >        System.out.println( "uri: " + uri );
> >        Configuration conf = new Configuration();
> >
> >        FileSystem fs = FileSystem.get( URI.create( uri ), conf );
> >        fs.printStatistics();
> >    }
> >
> >
> > But unfortunately, I can't get into hdfs ..... Any thoughts on this ?  I
> am
> > modifying the uri to access port 8020
> > which is what is in my core-site.xml .
> >
> >   // This fails, resulting (trys to connect over and over again,
> eventually
> > gives up printing "already tried to connect 20 times"....)
> >    public static void main(String[] args)
> >    {
> >        try {
> >            String uri = "hdfs://155.37.101.76:8020/";
> >
> >            System.out.println( "uri: " + uri );
> >            Configuration conf = new Configuration();
> >
> >            FileSystem fs = FileSystem.get( URI.create( uri ), conf );
> >            fs.printStatistics();
> >        } catch (Exception e) {
> >            // TODO Auto-generated catch block
> >            e.printStackTrace();
> >        }
> >    }
> >
> > The error message is :
> >
> > 11/10/28 19:03:38 INFO ipc.Client: Retrying connect to server: /
> > 155.37.101.76:8020. Already tried 0 time(s).
> > 11/10/28 19:03:39 INFO ipc.Client: Retrying connect to server: /
> > 155.37.101.76:8020. Already tried 1 time(s).
> > 11/10/28 19:03:40 INFO ipc.Client: Retrying connect to server: /
> > 155.37.101.76:8020. Already tried 2 time(s).
> > 11/10/28 19:03:41 INFO ipc.Client: Retrying connect to server: /
> > 155.37.101.76:8020. Already tried 3 time(s).
> >
> > Any thoughts on this would be *really* be appreciated  ... Thanks guys.
> >
>



-- 
Jay Vyas
MMSB/UCHC

Re: writing to hdfs via java api

Posted by Tom Melendez <to...@supertom.com>.
Hi Jay,

Some questions for you:

- Does the hadoop client itself work from that same machine?
- Are you actually able to run the hadoop example jar (in other words,
your setup is valid otherwise)?
- Is port 8020 actually available?  (you can telnet or nc to it?)
- What does jps show on the namenode?

Thanks,

Tom

On Fri, Oct 28, 2011 at 4:04 PM, Jay Vyas <ja...@gmail.com> wrote:
> Hi guys : Made more progress debugging my hadoop connection, but still
> haven't got it working......  It looks like my VM (cloudera hadoop) won't
> let me in.  I find that there is no issue connecting to the name node - that
> is , using hftp and 50070......
>
> via standard HFTP as in here :
>
> //This method works fine - connecting directly to hadoop's namenode and
> querying the filesystem
> public static void main1(String[] args) throws Exception
>    {
>        String uri = "hftp://155.37.101.76:50070/";
>
>        System.out.println( "uri: " + uri );
>        Configuration conf = new Configuration();
>
>        FileSystem fs = FileSystem.get( URI.create( uri ), conf );
>        fs.printStatistics();
>    }
>
>
> But unfortunately, I can't get into hdfs ..... Any thoughts on this ?  I am
> modifying the uri to access port 8020
> which is what is in my core-site.xml .
>
>   // This fails, resulting (trys to connect over and over again, eventually
> gives up printing "already tried to connect 20 times"....)
>    public static void main(String[] args)
>    {
>        try {
>            String uri = "hdfs://155.37.101.76:8020/";
>
>            System.out.println( "uri: " + uri );
>            Configuration conf = new Configuration();
>
>            FileSystem fs = FileSystem.get( URI.create( uri ), conf );
>            fs.printStatistics();
>        } catch (Exception e) {
>            // TODO Auto-generated catch block
>            e.printStackTrace();
>        }
>    }
>
> The error message is :
>
> 11/10/28 19:03:38 INFO ipc.Client: Retrying connect to server: /
> 155.37.101.76:8020. Already tried 0 time(s).
> 11/10/28 19:03:39 INFO ipc.Client: Retrying connect to server: /
> 155.37.101.76:8020. Already tried 1 time(s).
> 11/10/28 19:03:40 INFO ipc.Client: Retrying connect to server: /
> 155.37.101.76:8020. Already tried 2 time(s).
> 11/10/28 19:03:41 INFO ipc.Client: Retrying connect to server: /
> 155.37.101.76:8020. Already tried 3 time(s).
>
> Any thoughts on this would be *really* be appreciated  ... Thanks guys.
>

Re: writing to hdfs via java api

Posted by Jay Vyas <ja...@gmail.com>.
Hi guys : Made more progress debugging my hadoop connection, but still
haven't got it working......  It looks like my VM (cloudera hadoop) won't
let me in.  I find that there is no issue connecting to the name node - that
is , using hftp and 50070......

via standard HFTP as in here :

//This method works fine - connecting directly to hadoop's namenode and
querying the filesystem
public static void main1(String[] args) throws Exception
    {
        String uri = "hftp://155.37.101.76:50070/";

        System.out.println( "uri: " + uri );
        Configuration conf = new Configuration();

        FileSystem fs = FileSystem.get( URI.create( uri ), conf );
        fs.printStatistics();
    }


But unfortunately, I can't get into hdfs ..... Any thoughts on this ?  I am
modifying the uri to access port 8020
which is what is in my core-site.xml .

   // This fails, resulting (trys to connect over and over again, eventually
gives up printing "already tried to connect 20 times"....)
    public static void main(String[] args)
    {
        try {
            String uri = "hdfs://155.37.101.76:8020/";

            System.out.println( "uri: " + uri );
            Configuration conf = new Configuration();

            FileSystem fs = FileSystem.get( URI.create( uri ), conf );
            fs.printStatistics();
        } catch (Exception e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }

The error message is :

11/10/28 19:03:38 INFO ipc.Client: Retrying connect to server: /
155.37.101.76:8020. Already tried 0 time(s).
11/10/28 19:03:39 INFO ipc.Client: Retrying connect to server: /
155.37.101.76:8020. Already tried 1 time(s).
11/10/28 19:03:40 INFO ipc.Client: Retrying connect to server: /
155.37.101.76:8020. Already tried 2 time(s).
11/10/28 19:03:41 INFO ipc.Client: Retrying connect to server: /
155.37.101.76:8020. Already tried 3 time(s).

Any thoughts on this would be *really* be appreciated  ... Thanks guys.

Re: writing to hdfs via java api

Posted by Harsh J <ha...@cloudera.com>.
Jay,

Using the hdfs:// scheme is the right way, as you have determined. However…

A few things you need to ensure while using the Java FileSystem API to
do your HDFS tasks:

- Connect to NameNode's RPC port, not the web port. Default RPC port
is usually 8020, but your fs.default.name config will tell you the
right one.
- Do your client and server Hadoop versions match perfectly? If not,
make it so as you could run into protocol incompatibility issues
between versions.
- Ensure your client can connect to the RPC ports of NameNode and
DataNode both for reads/writes. If there's a firewall, you may need to
configure it to allow this.

On Fri, Oct 28, 2011 at 11:22 AM, Jay Vyas <ja...@gmail.com> wrote:
> I found a way to connect to hadoop via hftp, and it works fine, (read only)
> :
>
>    uri = "hftp://172.16.xxx.xxx:50070/";
>
>    System.out.println( "uri: " + uri );
>    Configuration conf = new Configuration();
>
>    FileSystem fs = FileSystem.get( URI.create( uri ), conf );
>    fs.printStatistics();
>
> However, it appears that hftp is read only, and I want to read/write as well
> as copy files, that is, I want to connect over hdfs . How can I enable hdfs
> connections so that i can edit the actual , remote filesystem using the file
> / path's APIs  ?  Are there ssh settings that have to be set before i can do
> this > ?
>
> I tried to change the protocol above from "hftp" -> "hdfs", but I got the
> following exception ...
>
> Exception in thread "main" java.io.IOException: Call to /
> 172.16.112.131:50070 failed on local exception: java.io.EOFException at
> org.apache.hadoop.ipc.Client.wrapException(Client.java:1139) at
> org.apache.hadoop.ipc.Client.call(Client.java:1107) at
> org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226) at
> $Proxy0.getProtocolVersion(Unknown Source) at
> org.apache.hadoop.ipc.RPC.getProxy(RPC.java:398) at
> org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384) at
> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:111) at
> org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:213) at
> org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:180) at
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1514) at
> org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67) at
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1548) at
> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1530) at
> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228) at
> sb.HadoopRemote.main(HadoopRemote.java:24)
>



-- 
Harsh J

Re: writing to hdfs via java api

Posted by Arpit Gupta <ar...@hortonworks.com>.
hdfs scheme should work but you will have to change the port. To find
the correct port # look for fs.default.name prop in the core-site.xml
or the namenode ui should also state the port.

--
Arpit

On Oct 27, 2011, at 10:52 PM, Jay Vyas <ja...@gmail.com> wrote:

> I found a way to connect to hadoop via hftp, and it works fine, (read only)
> :
>
>    uri = "hftp://172.16.xxx.xxx:50070/";
>
>    System.out.println( "uri: " + uri );
>    Configuration conf = new Configuration();
>
>    FileSystem fs = FileSystem.get( URI.create( uri ), conf );
>    fs.printStatistics();
>
> However, it appears that hftp is read only, and I want to read/write as well
> as copy files, that is, I want to connect over hdfs . How can I enable hdfs
> connections so that i can edit the actual , remote filesystem using the file
> / path's APIs  ?  Are there ssh settings that have to be set before i can do
> this > ?
>
> I tried to change the protocol above from "hftp" -> "hdfs", but I got the
> following exception ...
>
> Exception in thread "main" java.io.IOException: Call to /
> 172.16.112.131:50070 failed on local exception: java.io.EOFException at
> org.apache.hadoop.ipc.Client.wrapException(Client.java:1139) at
> org.apache.hadoop.ipc.Client.call(Client.java:1107) at
> org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226) at
> $Proxy0.getProtocolVersion(Unknown Source) at
> org.apache.hadoop.ipc.RPC.getProxy(RPC.java:398) at
> org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384) at
> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:111) at
> org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:213) at
> org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:180) at
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1514) at
> org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67) at
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1548) at
> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1530) at
> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228) at
> sb.HadoopRemote.main(HadoopRemote.java:24)