You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Murali Krishna <mu...@yahoo-inc.com> on 2008/07/22 13:39:20 UTC

distcp skipping the file

Hi,

My source folder has a single folder and a single file inside that. 

/user/<user>/distcpsrc/1/2 <r 3>   4       2008-07-22 04:22

In the destination, it is creating the folder '1' but not the file '2'. 

The counters show 1 file has been skipped.

08/07/22 04:22:36 INFO mapred.JobClient:     Files skipped=1

 

If I create one more file in any directory under the distscpsrc folder,
it copies both the files properly. Is this a bug?

[I am using 15.3]

 

Thanks,

Murali


RE: Hadoop warnings in pseudo-distributed mode

Posted by Arv Mistry <ar...@kindsight.net>.
Sorry, found the errors of my ways .... I forgot to add 127.0.0.1 to the
master/slave files

Cheers Arv


-----Original Message-----
From: Arv Mistry 
Sent: Tuesday, July 29, 2008 8:53 AM
To: 'core-user@hadoop.apache.org'
Subject: Hadoop warnings in pseudo-distributed mode

 
Could anyone tell me, is it normal to get warnings "could only be
replicated to 0 nodes, instead of 1" when running in a psudo-distributed
mode i.e. everything on one machine?

It seems to be writing to the files that I expect, just I get this
warning.

If it isn't normal, just some background;
 - I did have it running in a distributed mode, but have since deleted
the old file system. Is there any cleanup I may have missed?

Any help would be appreciated,

Cheers Arv

Hadoop warnings in pseudo-distributed mode

Posted by Arv Mistry <ar...@kindsight.net>.
 
Could anyone tell me, is it normal to get warnings "could only be
replicated to 0 nodes, instead of 1" when running in a psudo-distributed
mode i.e. everything on one machine?

It seems to be writing to the files that I expect, just I get this
warning.

If it isn't normal, just some background;
 - I did have it running in a distributed mode, but have since deleted
the old file system. Is there any cleanup I may have missed?

Any help would be appreciated,

Cheers Arv

Re: distcp skipping the file

Posted by Chris Douglas <ch...@yahoo-inc.com>.
>> The -update behavior is by design.
>
> If I am right, -update is to overwrite the file at the destination  
> if it
> is already there. But, in this case it is overwriting the folder as a
> file at destination which seems to be a bug
>

-update will replace the file if the source and destination sizes  
differ.

Copying a single file, particularly with -update, is a corner case for  
distcp. It distributes the copy at file granularity, so there is no  
advantage to using it in this case. In 0.15, IIRC, -update and - 
overwrite control only the "overwrite" actual in the call to create,  
which will replace directories when true. Many of these semantic  
oddities have been addressed in subsequent versions. The  
interpretation of the source and destination paths is slightly  
different when either of the two aforementioned options is set, as is  
covered in the documentation to be released with 0.18, currently  
available in subversion. -C

>
>
>>
>
>> Could you provide the command line, and the directory structure  
>> before
>
>> and after issuing the copy? -C
>
>
>
> Cmd is: hadoop distcp -update
> 'hftp://<srchost>:50070/user/<user>/distcpsrc' distcp_dest
>
>
>
> hadoop dfs -lsr distcpsrc
>
> /user/<user>/distcpsrc/1 <dir>           2008-07-24 05:53
>
> /user/<user>/distcpsrc/1/t       <r 3>   4       2008-07-22 06:12
>
>
>
> hadoop dfs -lsr  distcp_dest
>
> /user/<user>/distcp_dest/1       <r 3>   4       2008-07-24 06:03 <<
> expected /user/<user>/distcp_dest/1/t, file is copied as '1' instead  
> of
> '1/t'
>
>
>
> If I run without '-update', destination dir is:
>
> hadoop dfs -lsr  distcp_dest_noupdate
>
> /user/<user>/distcp_dest_noupdate/1      <dir>           2008-07-24
> 06:08 << file 't' is not copied and '1' is directory
>
>
>
> Thanks,
>
> Murali
>
>
>
>>
>
>> On Jul 22, 2008, at 9:46 PM, Murali Krishna wrote:
>
>>
>
>>> Hi,
>
>>>  I am using 0.15.3 and the destination is empty. One more
>
>>> behavior that I am seeing is that if I pass '-update' option, it is
>
>>> writing the content of file '2' in folder 1. (Makes the folder '1'
> as
>
>>> file in the destination). So, look like it is treating the
> destination
>
>>> for file distcpsrc/1/2 as distcpdest/1.
>
>>>
>
>>> Thanks,
>
>>> Murali
>
>>>
>
>>>> -----Original Message-----
>
>>>> From: Chris Douglas [mailto:chrisdo@yahoo-inc.com]
>
>>>> Sent: Wednesday, July 23, 2008 1:13 AM
>
>>>> To: core-user@hadoop.apache.org
>
>>>> Subject: Re: distcp skipping the file
>
>>>>
>
>>>> There were many fixes and improvements to distcp in 0.16, but most
> of
>
>>>> the critical fixes made it into 0.15.2 and 0.15.3. Is the
> destination
>
>>>> empty? Anything already existing at the destination is skipped. -C
>
>>>>
>
>>>> On Jul 22, 2008, at 4:39 AM, Murali Krishna wrote:
>
>>>>
>
>>>>> Hi,
>
>>>>>
>
>>>>> My source folder has a single folder and a single file inside
> that.
>
>>>>>
>
>>>>> /user/<user>/distcpsrc/1/2 <r 3>   4       2008-07-22 04:22
>
>>>>>
>
>>>>> In the destination, it is creating the folder '1' but not the file
>
>>>>> '2'.
>
>>>>>
>
>>>>> The counters show 1 file has been skipped.
>
>>>>>
>
>>>>> 08/07/22 04:22:36 INFO mapred.JobClient:     Files skipped=1
>
>>>>>
>
>>>>>
>
>>>>>
>
>>>>> If I create one more file in any directory under the distscpsrc
>
>>>>> folder,
>
>>>>> it copies both the files properly. Is this a bug?
>
>>>>>
>
>>>>> [I am using 15.3]
>
>>>>>
>
>>>>>
>
>>>>>
>
>>>>> Thanks,
>
>>>>>
>
>>>>> Murali
>
>>>>>
>
>>>
>
>
>


RE: distcp skipping the file

Posted by Murali Krishna <mu...@yahoo-inc.com>.
Hi,

> The -update behavior is by design.

If I am right, -update is to overwrite the file at the destination if it
is already there. But, in this case it is overwriting the folder as a
file at destination which seems to be a bug

 

> 

> Could you provide the command line, and the directory structure before

> and after issuing the copy? -C

 

Cmd is: hadoop distcp -update
'hftp://<srchost>:50070/user/<user>/distcpsrc' distcp_dest

 

hadoop dfs -lsr distcpsrc          

/user/<user>/distcpsrc/1 <dir>           2008-07-24 05:53

/user/<user>/distcpsrc/1/t       <r 3>   4       2008-07-22 06:12

 

hadoop dfs -lsr  distcp_dest

/user/<user>/distcp_dest/1       <r 3>   4       2008-07-24 06:03 <<
expected /user/<user>/distcp_dest/1/t, file is copied as '1' instead of
'1/t'

 

If I run without '-update', destination dir is:

hadoop dfs -lsr  distcp_dest_noupdate

/user/<user>/distcp_dest_noupdate/1      <dir>           2008-07-24
06:08 << file 't' is not copied and '1' is directory

 

Thanks,

Murali

 

> 

> On Jul 22, 2008, at 9:46 PM, Murali Krishna wrote:

> 

> > Hi,

> >   I am using 0.15.3 and the destination is empty. One more

> > behavior that I am seeing is that if I pass '-update' option, it is

> > writing the content of file '2' in folder 1. (Makes the folder '1'
as

> > file in the destination). So, look like it is treating the
destination

> > for file distcpsrc/1/2 as distcpdest/1.

> >

> > Thanks,

> > Murali

> >

> >> -----Original Message-----

> >> From: Chris Douglas [mailto:chrisdo@yahoo-inc.com]

> >> Sent: Wednesday, July 23, 2008 1:13 AM

> >> To: core-user@hadoop.apache.org

> >> Subject: Re: distcp skipping the file

> >>

> >> There were many fixes and improvements to distcp in 0.16, but most
of

> >> the critical fixes made it into 0.15.2 and 0.15.3. Is the
destination

> >> empty? Anything already existing at the destination is skipped. -C

> >>

> >> On Jul 22, 2008, at 4:39 AM, Murali Krishna wrote:

> >>

> >>> Hi,

> >>>

> >>> My source folder has a single folder and a single file inside
that.

> >>>

> >>> /user/<user>/distcpsrc/1/2 <r 3>   4       2008-07-22 04:22

> >>>

> >>> In the destination, it is creating the folder '1' but not the file

> >>> '2'.

> >>>

> >>> The counters show 1 file has been skipped.

> >>>

> >>> 08/07/22 04:22:36 INFO mapred.JobClient:     Files skipped=1

> >>>

> >>>

> >>>

> >>> If I create one more file in any directory under the distscpsrc

> >>> folder,

> >>> it copies both the files properly. Is this a bug?

> >>>

> >>> [I am using 15.3]

> >>>

> >>>

> >>>

> >>> Thanks,

> >>>

> >>> Murali

> >>>

> >

 


Re: distcp skipping the file

Posted by Chris Douglas <ch...@yahoo-inc.com>.
The -update behavior is by design.

Could you provide the command line, and the directory structure before  
and after issuing the copy? -C

On Jul 22, 2008, at 9:46 PM, Murali Krishna wrote:

> Hi,
> 	I am using 0.15.3 and the destination is empty. One more
> behavior that I am seeing is that if I pass '-update' option, it is
> writing the content of file '2' in folder 1. (Makes the folder '1' as
> file in the destination). So, look like it is treating the destination
> for file distcpsrc/1/2 as distcpdest/1.
>
> Thanks,
> Murali
>
>> -----Original Message-----
>> From: Chris Douglas [mailto:chrisdo@yahoo-inc.com]
>> Sent: Wednesday, July 23, 2008 1:13 AM
>> To: core-user@hadoop.apache.org
>> Subject: Re: distcp skipping the file
>>
>> There were many fixes and improvements to distcp in 0.16, but most of
>> the critical fixes made it into 0.15.2 and 0.15.3. Is the destination
>> empty? Anything already existing at the destination is skipped. -C
>>
>> On Jul 22, 2008, at 4:39 AM, Murali Krishna wrote:
>>
>>> Hi,
>>>
>>> My source folder has a single folder and a single file inside that.
>>>
>>> /user/<user>/distcpsrc/1/2 <r 3>   4       2008-07-22 04:22
>>>
>>> In the destination, it is creating the folder '1' but not the file
>>> '2'.
>>>
>>> The counters show 1 file has been skipped.
>>>
>>> 08/07/22 04:22:36 INFO mapred.JobClient:     Files skipped=1
>>>
>>>
>>>
>>> If I create one more file in any directory under the distscpsrc
>>> folder,
>>> it copies both the files properly. Is this a bug?
>>>
>>> [I am using 15.3]
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Murali
>>>
>


RE: distcp skipping the file

Posted by Murali Krishna <mu...@yahoo-inc.com>.
Hi,
	I am using 0.15.3 and the destination is empty. One more
behavior that I am seeing is that if I pass '-update' option, it is
writing the content of file '2' in folder 1. (Makes the folder '1' as
file in the destination). So, look like it is treating the destination
for file distcpsrc/1/2 as distcpdest/1.

Thanks,
Murali

> -----Original Message-----
> From: Chris Douglas [mailto:chrisdo@yahoo-inc.com]
> Sent: Wednesday, July 23, 2008 1:13 AM
> To: core-user@hadoop.apache.org
> Subject: Re: distcp skipping the file
> 
> There were many fixes and improvements to distcp in 0.16, but most of
> the critical fixes made it into 0.15.2 and 0.15.3. Is the destination
> empty? Anything already existing at the destination is skipped. -C
> 
> On Jul 22, 2008, at 4:39 AM, Murali Krishna wrote:
> 
> > Hi,
> >
> > My source folder has a single folder and a single file inside that.
> >
> > /user/<user>/distcpsrc/1/2 <r 3>   4       2008-07-22 04:22
> >
> > In the destination, it is creating the folder '1' but not the file
> > '2'.
> >
> > The counters show 1 file has been skipped.
> >
> > 08/07/22 04:22:36 INFO mapred.JobClient:     Files skipped=1
> >
> >
> >
> > If I create one more file in any directory under the distscpsrc
> > folder,
> > it copies both the files properly. Is this a bug?
> >
> > [I am using 15.3]
> >
> >
> >
> > Thanks,
> >
> > Murali
> >


Re: distcp skipping the file

Posted by Chris Douglas <ch...@yahoo-inc.com>.
There were many fixes and improvements to distcp in 0.16, but most of  
the critical fixes made it into 0.15.2 and 0.15.3. Is the destination  
empty? Anything already existing at the destination is skipped. -C

On Jul 22, 2008, at 4:39 AM, Murali Krishna wrote:

> Hi,
>
> My source folder has a single folder and a single file inside that.
>
> /user/<user>/distcpsrc/1/2 <r 3>   4       2008-07-22 04:22
>
> In the destination, it is creating the folder '1' but not the file  
> '2'.
>
> The counters show 1 file has been skipped.
>
> 08/07/22 04:22:36 INFO mapred.JobClient:     Files skipped=1
>
>
>
> If I create one more file in any directory under the distscpsrc  
> folder,
> it copies both the files properly. Is this a bug?
>
> [I am using 15.3]
>
>
>
> Thanks,
>
> Murali
>