You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Murali Krishna <mu...@yahoo-inc.com> on 2008/07/22 13:39:20 UTC
distcp skipping the file
Hi,
My source folder has a single folder and a single file inside that.
/user/<user>/distcpsrc/1/2 <r 3> 4 2008-07-22 04:22
In the destination, it is creating the folder '1' but not the file '2'.
The counters show 1 file has been skipped.
08/07/22 04:22:36 INFO mapred.JobClient: Files skipped=1
If I create one more file in any directory under the distscpsrc folder,
it copies both the files properly. Is this a bug?
[I am using 15.3]
Thanks,
Murali
RE: Hadoop warnings in pseudo-distributed mode
Posted by Arv Mistry <ar...@kindsight.net>.
Sorry, found the errors of my ways .... I forgot to add 127.0.0.1 to the
master/slave files
Cheers Arv
-----Original Message-----
From: Arv Mistry
Sent: Tuesday, July 29, 2008 8:53 AM
To: 'core-user@hadoop.apache.org'
Subject: Hadoop warnings in pseudo-distributed mode
Could anyone tell me, is it normal to get warnings "could only be
replicated to 0 nodes, instead of 1" when running in a psudo-distributed
mode i.e. everything on one machine?
It seems to be writing to the files that I expect, just I get this
warning.
If it isn't normal, just some background;
- I did have it running in a distributed mode, but have since deleted
the old file system. Is there any cleanup I may have missed?
Any help would be appreciated,
Cheers Arv
Hadoop warnings in pseudo-distributed mode
Posted by Arv Mistry <ar...@kindsight.net>.
Could anyone tell me, is it normal to get warnings "could only be
replicated to 0 nodes, instead of 1" when running in a psudo-distributed
mode i.e. everything on one machine?
It seems to be writing to the files that I expect, just I get this
warning.
If it isn't normal, just some background;
- I did have it running in a distributed mode, but have since deleted
the old file system. Is there any cleanup I may have missed?
Any help would be appreciated,
Cheers Arv
Re: distcp skipping the file
Posted by Chris Douglas <ch...@yahoo-inc.com>.
>> The -update behavior is by design.
>
> If I am right, -update is to overwrite the file at the destination
> if it
> is already there. But, in this case it is overwriting the folder as a
> file at destination which seems to be a bug
>
-update will replace the file if the source and destination sizes
differ.
Copying a single file, particularly with -update, is a corner case for
distcp. It distributes the copy at file granularity, so there is no
advantage to using it in this case. In 0.15, IIRC, -update and -
overwrite control only the "overwrite" actual in the call to create,
which will replace directories when true. Many of these semantic
oddities have been addressed in subsequent versions. The
interpretation of the source and destination paths is slightly
different when either of the two aforementioned options is set, as is
covered in the documentation to be released with 0.18, currently
available in subversion. -C
>
>
>>
>
>> Could you provide the command line, and the directory structure
>> before
>
>> and after issuing the copy? -C
>
>
>
> Cmd is: hadoop distcp -update
> 'hftp://<srchost>:50070/user/<user>/distcpsrc' distcp_dest
>
>
>
> hadoop dfs -lsr distcpsrc
>
> /user/<user>/distcpsrc/1 <dir> 2008-07-24 05:53
>
> /user/<user>/distcpsrc/1/t <r 3> 4 2008-07-22 06:12
>
>
>
> hadoop dfs -lsr distcp_dest
>
> /user/<user>/distcp_dest/1 <r 3> 4 2008-07-24 06:03 <<
> expected /user/<user>/distcp_dest/1/t, file is copied as '1' instead
> of
> '1/t'
>
>
>
> If I run without '-update', destination dir is:
>
> hadoop dfs -lsr distcp_dest_noupdate
>
> /user/<user>/distcp_dest_noupdate/1 <dir> 2008-07-24
> 06:08 << file 't' is not copied and '1' is directory
>
>
>
> Thanks,
>
> Murali
>
>
>
>>
>
>> On Jul 22, 2008, at 9:46 PM, Murali Krishna wrote:
>
>>
>
>>> Hi,
>
>>> I am using 0.15.3 and the destination is empty. One more
>
>>> behavior that I am seeing is that if I pass '-update' option, it is
>
>>> writing the content of file '2' in folder 1. (Makes the folder '1'
> as
>
>>> file in the destination). So, look like it is treating the
> destination
>
>>> for file distcpsrc/1/2 as distcpdest/1.
>
>>>
>
>>> Thanks,
>
>>> Murali
>
>>>
>
>>>> -----Original Message-----
>
>>>> From: Chris Douglas [mailto:chrisdo@yahoo-inc.com]
>
>>>> Sent: Wednesday, July 23, 2008 1:13 AM
>
>>>> To: core-user@hadoop.apache.org
>
>>>> Subject: Re: distcp skipping the file
>
>>>>
>
>>>> There were many fixes and improvements to distcp in 0.16, but most
> of
>
>>>> the critical fixes made it into 0.15.2 and 0.15.3. Is the
> destination
>
>>>> empty? Anything already existing at the destination is skipped. -C
>
>>>>
>
>>>> On Jul 22, 2008, at 4:39 AM, Murali Krishna wrote:
>
>>>>
>
>>>>> Hi,
>
>>>>>
>
>>>>> My source folder has a single folder and a single file inside
> that.
>
>>>>>
>
>>>>> /user/<user>/distcpsrc/1/2 <r 3> 4 2008-07-22 04:22
>
>>>>>
>
>>>>> In the destination, it is creating the folder '1' but not the file
>
>>>>> '2'.
>
>>>>>
>
>>>>> The counters show 1 file has been skipped.
>
>>>>>
>
>>>>> 08/07/22 04:22:36 INFO mapred.JobClient: Files skipped=1
>
>>>>>
>
>>>>>
>
>>>>>
>
>>>>> If I create one more file in any directory under the distscpsrc
>
>>>>> folder,
>
>>>>> it copies both the files properly. Is this a bug?
>
>>>>>
>
>>>>> [I am using 15.3]
>
>>>>>
>
>>>>>
>
>>>>>
>
>>>>> Thanks,
>
>>>>>
>
>>>>> Murali
>
>>>>>
>
>>>
>
>
>
RE: distcp skipping the file
Posted by Murali Krishna <mu...@yahoo-inc.com>.
Hi,
> The -update behavior is by design.
If I am right, -update is to overwrite the file at the destination if it
is already there. But, in this case it is overwriting the folder as a
file at destination which seems to be a bug
>
> Could you provide the command line, and the directory structure before
> and after issuing the copy? -C
Cmd is: hadoop distcp -update
'hftp://<srchost>:50070/user/<user>/distcpsrc' distcp_dest
hadoop dfs -lsr distcpsrc
/user/<user>/distcpsrc/1 <dir> 2008-07-24 05:53
/user/<user>/distcpsrc/1/t <r 3> 4 2008-07-22 06:12
hadoop dfs -lsr distcp_dest
/user/<user>/distcp_dest/1 <r 3> 4 2008-07-24 06:03 <<
expected /user/<user>/distcp_dest/1/t, file is copied as '1' instead of
'1/t'
If I run without '-update', destination dir is:
hadoop dfs -lsr distcp_dest_noupdate
/user/<user>/distcp_dest_noupdate/1 <dir> 2008-07-24
06:08 << file 't' is not copied and '1' is directory
Thanks,
Murali
>
> On Jul 22, 2008, at 9:46 PM, Murali Krishna wrote:
>
> > Hi,
> > I am using 0.15.3 and the destination is empty. One more
> > behavior that I am seeing is that if I pass '-update' option, it is
> > writing the content of file '2' in folder 1. (Makes the folder '1'
as
> > file in the destination). So, look like it is treating the
destination
> > for file distcpsrc/1/2 as distcpdest/1.
> >
> > Thanks,
> > Murali
> >
> >> -----Original Message-----
> >> From: Chris Douglas [mailto:chrisdo@yahoo-inc.com]
> >> Sent: Wednesday, July 23, 2008 1:13 AM
> >> To: core-user@hadoop.apache.org
> >> Subject: Re: distcp skipping the file
> >>
> >> There were many fixes and improvements to distcp in 0.16, but most
of
> >> the critical fixes made it into 0.15.2 and 0.15.3. Is the
destination
> >> empty? Anything already existing at the destination is skipped. -C
> >>
> >> On Jul 22, 2008, at 4:39 AM, Murali Krishna wrote:
> >>
> >>> Hi,
> >>>
> >>> My source folder has a single folder and a single file inside
that.
> >>>
> >>> /user/<user>/distcpsrc/1/2 <r 3> 4 2008-07-22 04:22
> >>>
> >>> In the destination, it is creating the folder '1' but not the file
> >>> '2'.
> >>>
> >>> The counters show 1 file has been skipped.
> >>>
> >>> 08/07/22 04:22:36 INFO mapred.JobClient: Files skipped=1
> >>>
> >>>
> >>>
> >>> If I create one more file in any directory under the distscpsrc
> >>> folder,
> >>> it copies both the files properly. Is this a bug?
> >>>
> >>> [I am using 15.3]
> >>>
> >>>
> >>>
> >>> Thanks,
> >>>
> >>> Murali
> >>>
> >
Re: distcp skipping the file
Posted by Chris Douglas <ch...@yahoo-inc.com>.
The -update behavior is by design.
Could you provide the command line, and the directory structure before
and after issuing the copy? -C
On Jul 22, 2008, at 9:46 PM, Murali Krishna wrote:
> Hi,
> I am using 0.15.3 and the destination is empty. One more
> behavior that I am seeing is that if I pass '-update' option, it is
> writing the content of file '2' in folder 1. (Makes the folder '1' as
> file in the destination). So, look like it is treating the destination
> for file distcpsrc/1/2 as distcpdest/1.
>
> Thanks,
> Murali
>
>> -----Original Message-----
>> From: Chris Douglas [mailto:chrisdo@yahoo-inc.com]
>> Sent: Wednesday, July 23, 2008 1:13 AM
>> To: core-user@hadoop.apache.org
>> Subject: Re: distcp skipping the file
>>
>> There were many fixes and improvements to distcp in 0.16, but most of
>> the critical fixes made it into 0.15.2 and 0.15.3. Is the destination
>> empty? Anything already existing at the destination is skipped. -C
>>
>> On Jul 22, 2008, at 4:39 AM, Murali Krishna wrote:
>>
>>> Hi,
>>>
>>> My source folder has a single folder and a single file inside that.
>>>
>>> /user/<user>/distcpsrc/1/2 <r 3> 4 2008-07-22 04:22
>>>
>>> In the destination, it is creating the folder '1' but not the file
>>> '2'.
>>>
>>> The counters show 1 file has been skipped.
>>>
>>> 08/07/22 04:22:36 INFO mapred.JobClient: Files skipped=1
>>>
>>>
>>>
>>> If I create one more file in any directory under the distscpsrc
>>> folder,
>>> it copies both the files properly. Is this a bug?
>>>
>>> [I am using 15.3]
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Murali
>>>
>
RE: distcp skipping the file
Posted by Murali Krishna <mu...@yahoo-inc.com>.
Hi,
I am using 0.15.3 and the destination is empty. One more
behavior that I am seeing is that if I pass '-update' option, it is
writing the content of file '2' in folder 1. (Makes the folder '1' as
file in the destination). So, look like it is treating the destination
for file distcpsrc/1/2 as distcpdest/1.
Thanks,
Murali
> -----Original Message-----
> From: Chris Douglas [mailto:chrisdo@yahoo-inc.com]
> Sent: Wednesday, July 23, 2008 1:13 AM
> To: core-user@hadoop.apache.org
> Subject: Re: distcp skipping the file
>
> There were many fixes and improvements to distcp in 0.16, but most of
> the critical fixes made it into 0.15.2 and 0.15.3. Is the destination
> empty? Anything already existing at the destination is skipped. -C
>
> On Jul 22, 2008, at 4:39 AM, Murali Krishna wrote:
>
> > Hi,
> >
> > My source folder has a single folder and a single file inside that.
> >
> > /user/<user>/distcpsrc/1/2 <r 3> 4 2008-07-22 04:22
> >
> > In the destination, it is creating the folder '1' but not the file
> > '2'.
> >
> > The counters show 1 file has been skipped.
> >
> > 08/07/22 04:22:36 INFO mapred.JobClient: Files skipped=1
> >
> >
> >
> > If I create one more file in any directory under the distscpsrc
> > folder,
> > it copies both the files properly. Is this a bug?
> >
> > [I am using 15.3]
> >
> >
> >
> > Thanks,
> >
> > Murali
> >
Re: distcp skipping the file
Posted by Chris Douglas <ch...@yahoo-inc.com>.
There were many fixes and improvements to distcp in 0.16, but most of
the critical fixes made it into 0.15.2 and 0.15.3. Is the destination
empty? Anything already existing at the destination is skipped. -C
On Jul 22, 2008, at 4:39 AM, Murali Krishna wrote:
> Hi,
>
> My source folder has a single folder and a single file inside that.
>
> /user/<user>/distcpsrc/1/2 <r 3> 4 2008-07-22 04:22
>
> In the destination, it is creating the folder '1' but not the file
> '2'.
>
> The counters show 1 file has been skipped.
>
> 08/07/22 04:22:36 INFO mapred.JobClient: Files skipped=1
>
>
>
> If I create one more file in any directory under the distscpsrc
> folder,
> it copies both the files properly. Is this a bug?
>
> [I am using 15.3]
>
>
>
> Thanks,
>
> Murali
>