You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by jamal sasha <ja...@gmail.com> on 2013/03/02 00:21:41 UTC

Re: copy chunk of hadoop output

When I try this.. I get an error
cat: Unable to write to output stream.

Are these permissions issue
How do i resolve this?
THanks


On Wed, Feb 20, 2013 at 12:21 PM, Harsh J <ha...@cloudera.com> wrote:

> No problem JM, I was confused as well.
>
> AFAIK, there's no shell utility that can let you specify an offset #
> of bytes to start off with (similar to skip in dd?), but that can be
> done from the FS API.
>
> On Thu, Feb 21, 2013 at 1:14 AM, Jean-Marc Spaggiari
> <je...@spaggiari.org> wrote:
> > Hi Harsh,
> >
> > My bad.
> >
> > I read the example quickly and I don't know why I tought you used tail
> > and not head.
> >
> > head will work perfectly. But tail will not since it will need to read
> > the entier file. My comment was for tail, not for head, and therefore
> > not application to the example you gave.
> >
> >
> > hadoop fs -cat 100-byte-dfs-file | tail -c 5 > 5-byte-local-file
> >
> > Will have to download the entire file.
> >
> > Is there a way to "jump" into a certain position in a file and "cat"
> from there?
> >
> > JM
> >
> > 2013/2/20, Harsh J <ha...@cloudera.com>:
> >> Hi JM,
> >>
> >> I am not sure how "dangerous" it is, since we're using a pipe here,
> >> and as you yourself note, it will only last as long as the last bytes
> >> have been got and then terminate.
> >>
> >> The -cat process will terminate because the
> >> process we're piping to will terminate first after it reaches its goal
> >> of -c <N bytes>; so certainly the "-cat" program will not fetch the
> >> whole file down but it may fetch a few bytes extra over communication
> >> due to use of read buffers (the extra data won't be put into the target
> >> file, and get discarded).
> >>
> >> We can try it out and observe the "clienttrace" logged
> >> at the DN at the end of the -cat's read. Here's an example:
> >>
> >> I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes"
> >> below, its ~1.58 MB:
> >>
> >> 2013-02-20 23:55:19,777 INFO
> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> >> /127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op:
> >> HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,
> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
> >> duration: 192289000
> >>
> >> I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to
> >> store first 5 bytes onto a local file:
> >>
> >> Asserting that post command we get 5 bytes:
> >> ➜  ~ wc -c foo.xml
> >>        5 foo.xml
> >>
> >> Asserting that DN didn't IO-read the whole file, see the read op below
> >> and its "bytes" parameter, its only about 193 KB, not the whole block
> >> of 1.58 MB we wrote earlier:
> >>
> >> 2013-02-21 00:01:32,437 INFO
> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> >> /127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op:
> >> HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,
> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
> >> duration: 19207000
> >>
> >> I don't see how this is anymore dangerous than doing a
> >> -copyToLocal/-get, which retrieves the whole file anyway?
> >>
> >> On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
> >> <je...@spaggiari.org> wrote:
> >>> But be careful.
> >>>
> >>> hadoop fs -cat will retrieve the entire file and last only when it
> >>> will have retrieve the last bytes you are looking for.
> >>>
> >>> If your file is many GB big, it will take a lot of time for this
> >>> command to complete and will put some pressure on your network.
> >>>
> >>> JM
> >>>
> >>> 2013/2/19, jamal sasha <ja...@gmail.com>:
> >>>> Awesome thanks :)
> >>>>
> >>>>
> >>>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com> wrote:
> >>>>
> >>>>> You can instead use 'fs -cat' and the 'head' coreutil, as one
> example:
> >>>>>
> >>>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
> >>>>>
> >>>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <ja...@gmail.com>
> >>>>> wrote:
> >>>>> > Hi,
> >>>>> >   I was wondering in the following command:
> >>>>> >
> >>>>> > bin/hadoop dfs -copyToLocal hdfspath localpath
> >>>>> > can we have specify to copy not full but like xMB's of file to
> local
> >>>>> drive?
> >>>>> >
> >>>>> > Is something like this possible
> >>>>> > Thanks
> >>>>> > Jamal
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Harsh J
> >>>>>
> >>>>
> >>
> >>
> >>
> >> --
> >> Harsh J
> >>
>
>
>
> --
> Harsh J
>

Re: copy chunk of hadoop output

Posted by Azuryy Yu <az...@gmail.com>.

yes,just ignore this log.
On Mar 2, 2013 7:27 AM, "jamal sasha" <ja...@gmail.com> wrote:

> Though it copies.. but it gives this error?
>
>
> On Fri, Mar 1, 2013 at 3:21 PM, jamal sasha <ja...@gmail.com> wrote:
>
>> When I try this.. I get an error
>> cat: Unable to write to output stream.
>>
>> Are these permissions issue
>> How do i resolve this?
>> THanks
>>
>>
>> On Wed, Feb 20, 2013 at 12:21 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> No problem JM, I was confused as well.
>>>
>>> AFAIK, there's no shell utility that can let you specify an offset #
>>> of bytes to start off with (similar to skip in dd?), but that can be
>>> done from the FS API.
>>>
>>> On Thu, Feb 21, 2013 at 1:14 AM, Jean-Marc Spaggiari
>>> <je...@spaggiari.org> wrote:
>>> > Hi Harsh,
>>> >
>>> > My bad.
>>> >
>>> > I read the example quickly and I don't know why I tought you used tail
>>> > and not head.
>>> >
>>> > head will work perfectly. But tail will not since it will need to read
>>> > the entier file. My comment was for tail, not for head, and therefore
>>> > not application to the example you gave.
>>> >
>>> >
>>> > hadoop fs -cat 100-byte-dfs-file | tail -c 5 > 5-byte-local-file
>>> >
>>> > Will have to download the entire file.
>>> >
>>> > Is there a way to "jump" into a certain position in a file and "cat"
>>> from there?
>>> >
>>> > JM
>>> >
>>> > 2013/2/20, Harsh J <ha...@cloudera.com>:
>>> >> Hi JM,
>>> >>
>>> >> I am not sure how "dangerous" it is, since we're using a pipe here,
>>> >> and as you yourself note, it will only last as long as the last bytes
>>> >> have been got and then terminate.
>>> >>
>>> >> The -cat process will terminate because the
>>> >> process we're piping to will terminate first after it reaches its goal
>>> >> of -c <N bytes>; so certainly the "-cat" program will not fetch the
>>> >> whole file down but it may fetch a few bytes extra over communication
>>> >> due to use of read buffers (the extra data won't be put into the
>>> target
>>> >> file, and get discarded).
>>> >>
>>> >> We can try it out and observe the "clienttrace" logged
>>> >> at the DN at the end of the -cat's read. Here's an example:
>>> >>
>>> >> I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes"
>>> >> below, its ~1.58 MB:
>>> >>
>>> >> 2013-02-20 23:55:19,777 INFO
>>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>>> >> /127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op:
>>> >> HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,
>>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>>> >> duration: 192289000
>>> >>
>>> >> I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to
>>> >> store first 5 bytes onto a local file:
>>> >>
>>> >> Asserting that post command we get 5 bytes:
>>> >> ➜  ~ wc -c foo.xml
>>> >>        5 foo.xml
>>> >>
>>> >> Asserting that DN didn't IO-read the whole file, see the read op below
>>> >> and its "bytes" parameter, its only about 193 KB, not the whole block
>>> >> of 1.58 MB we wrote earlier:
>>> >>
>>> >> 2013-02-21 00:01:32,437 INFO
>>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>>> >> /127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op:
>>> >> HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,
>>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>>> >> duration: 19207000
>>> >>
>>> >> I don't see how this is anymore dangerous than doing a
>>> >> -copyToLocal/-get, which retrieves the whole file anyway?
>>> >>
>>> >> On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
>>> >> <je...@spaggiari.org> wrote:
>>> >>> But be careful.
>>> >>>
>>> >>> hadoop fs -cat will retrieve the entire file and last only when it
>>> >>> will have retrieve the last bytes you are looking for.
>>> >>>
>>> >>> If your file is many GB big, it will take a lot of time for this
>>> >>> command to complete and will put some pressure on your network.
>>> >>>
>>> >>> JM
>>> >>>
>>> >>> 2013/2/19, jamal sasha <ja...@gmail.com>:
>>> >>>> Awesome thanks :)
>>> >>>>
>>> >>>>
>>> >>>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com>
>>> wrote:
>>> >>>>
>>> >>>>> You can instead use 'fs -cat' and the 'head' coreutil, as one
>>> example:
>>> >>>>>
>>> >>>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>>> >>>>>
>>> >>>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <
>>> jamalshasha@gmail.com>
>>> >>>>> wrote:
>>> >>>>> > Hi,
>>> >>>>> >   I was wondering in the following command:
>>> >>>>> >
>>> >>>>> > bin/hadoop dfs -copyToLocal hdfspath localpath
>>> >>>>> > can we have specify to copy not full but like xMB's of file to
>>> local
>>> >>>>> drive?
>>> >>>>> >
>>> >>>>> > Is something like this possible
>>> >>>>> > Thanks
>>> >>>>> > Jamal
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> --
>>> >>>>> Harsh J
>>> >>>>>
>>> >>>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Harsh J
>>> >>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>

Re: copy chunk of hadoop output

Posted by Azuryy Yu <az...@gmail.com>.

yes,just ignore this log.
On Mar 2, 2013 7:27 AM, "jamal sasha" <ja...@gmail.com> wrote:

> Though it copies.. but it gives this error?
>
>
> On Fri, Mar 1, 2013 at 3:21 PM, jamal sasha <ja...@gmail.com> wrote:
>
>> When I try this.. I get an error
>> cat: Unable to write to output stream.
>>
>> Are these permissions issue
>> How do i resolve this?
>> THanks
>>
>>
>> On Wed, Feb 20, 2013 at 12:21 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> No problem JM, I was confused as well.
>>>
>>> AFAIK, there's no shell utility that can let you specify an offset #
>>> of bytes to start off with (similar to skip in dd?), but that can be
>>> done from the FS API.
>>>
>>> On Thu, Feb 21, 2013 at 1:14 AM, Jean-Marc Spaggiari
>>> <je...@spaggiari.org> wrote:
>>> > Hi Harsh,
>>> >
>>> > My bad.
>>> >
>>> > I read the example quickly and I don't know why I tought you used tail
>>> > and not head.
>>> >
>>> > head will work perfectly. But tail will not since it will need to read
>>> > the entier file. My comment was for tail, not for head, and therefore
>>> > not application to the example you gave.
>>> >
>>> >
>>> > hadoop fs -cat 100-byte-dfs-file | tail -c 5 > 5-byte-local-file
>>> >
>>> > Will have to download the entire file.
>>> >
>>> > Is there a way to "jump" into a certain position in a file and "cat"
>>> from there?
>>> >
>>> > JM
>>> >
>>> > 2013/2/20, Harsh J <ha...@cloudera.com>:
>>> >> Hi JM,
>>> >>
>>> >> I am not sure how "dangerous" it is, since we're using a pipe here,
>>> >> and as you yourself note, it will only last as long as the last bytes
>>> >> have been got and then terminate.
>>> >>
>>> >> The -cat process will terminate because the
>>> >> process we're piping to will terminate first after it reaches its goal
>>> >> of -c <N bytes>; so certainly the "-cat" program will not fetch the
>>> >> whole file down but it may fetch a few bytes extra over communication
>>> >> due to use of read buffers (the extra data won't be put into the
>>> target
>>> >> file, and get discarded).
>>> >>
>>> >> We can try it out and observe the "clienttrace" logged
>>> >> at the DN at the end of the -cat's read. Here's an example:
>>> >>
>>> >> I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes"
>>> >> below, its ~1.58 MB:
>>> >>
>>> >> 2013-02-20 23:55:19,777 INFO
>>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>>> >> /127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op:
>>> >> HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,
>>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>>> >> duration: 192289000
>>> >>
>>> >> I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to
>>> >> store first 5 bytes onto a local file:
>>> >>
>>> >> Asserting that post command we get 5 bytes:
>>> >> ➜  ~ wc -c foo.xml
>>> >>        5 foo.xml
>>> >>
>>> >> Asserting that DN didn't IO-read the whole file, see the read op below
>>> >> and its "bytes" parameter, its only about 193 KB, not the whole block
>>> >> of 1.58 MB we wrote earlier:
>>> >>
>>> >> 2013-02-21 00:01:32,437 INFO
>>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>>> >> /127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op:
>>> >> HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,
>>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>>> >> duration: 19207000
>>> >>
>>> >> I don't see how this is anymore dangerous than doing a
>>> >> -copyToLocal/-get, which retrieves the whole file anyway?
>>> >>
>>> >> On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
>>> >> <je...@spaggiari.org> wrote:
>>> >>> But be careful.
>>> >>>
>>> >>> hadoop fs -cat will retrieve the entire file and last only when it
>>> >>> will have retrieve the last bytes you are looking for.
>>> >>>
>>> >>> If your file is many GB big, it will take a lot of time for this
>>> >>> command to complete and will put some pressure on your network.
>>> >>>
>>> >>> JM
>>> >>>
>>> >>> 2013/2/19, jamal sasha <ja...@gmail.com>:
>>> >>>> Awesome thanks :)
>>> >>>>
>>> >>>>
>>> >>>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com>
>>> wrote:
>>> >>>>
>>> >>>>> You can instead use 'fs -cat' and the 'head' coreutil, as one
>>> example:
>>> >>>>>
>>> >>>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>>> >>>>>
>>> >>>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <
>>> jamalshasha@gmail.com>
>>> >>>>> wrote:
>>> >>>>> > Hi,
>>> >>>>> >   I was wondering in the following command:
>>> >>>>> >
>>> >>>>> > bin/hadoop dfs -copyToLocal hdfspath localpath
>>> >>>>> > can we have specify to copy not full but like xMB's of file to
>>> local
>>> >>>>> drive?
>>> >>>>> >
>>> >>>>> > Is something like this possible
>>> >>>>> > Thanks
>>> >>>>> > Jamal
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> --
>>> >>>>> Harsh J
>>> >>>>>
>>> >>>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Harsh J
>>> >>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>

Re: copy chunk of hadoop output

Posted by Azuryy Yu <az...@gmail.com>.

yes,just ignore this log.
On Mar 2, 2013 7:27 AM, "jamal sasha" <ja...@gmail.com> wrote:

> Though it copies.. but it gives this error?
>
>
> On Fri, Mar 1, 2013 at 3:21 PM, jamal sasha <ja...@gmail.com> wrote:
>
>> When I try this.. I get an error
>> cat: Unable to write to output stream.
>>
>> Are these permissions issue
>> How do i resolve this?
>> THanks
>>
>>
>> On Wed, Feb 20, 2013 at 12:21 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> No problem JM, I was confused as well.
>>>
>>> AFAIK, there's no shell utility that can let you specify an offset #
>>> of bytes to start off with (similar to skip in dd?), but that can be
>>> done from the FS API.
>>>
>>> On Thu, Feb 21, 2013 at 1:14 AM, Jean-Marc Spaggiari
>>> <je...@spaggiari.org> wrote:
>>> > Hi Harsh,
>>> >
>>> > My bad.
>>> >
>>> > I read the example quickly and I don't know why I tought you used tail
>>> > and not head.
>>> >
>>> > head will work perfectly. But tail will not since it will need to read
>>> > the entier file. My comment was for tail, not for head, and therefore
>>> > not application to the example you gave.
>>> >
>>> >
>>> > hadoop fs -cat 100-byte-dfs-file | tail -c 5 > 5-byte-local-file
>>> >
>>> > Will have to download the entire file.
>>> >
>>> > Is there a way to "jump" into a certain position in a file and "cat"
>>> from there?
>>> >
>>> > JM
>>> >
>>> > 2013/2/20, Harsh J <ha...@cloudera.com>:
>>> >> Hi JM,
>>> >>
>>> >> I am not sure how "dangerous" it is, since we're using a pipe here,
>>> >> and as you yourself note, it will only last as long as the last bytes
>>> >> have been got and then terminate.
>>> >>
>>> >> The -cat process will terminate because the
>>> >> process we're piping to will terminate first after it reaches its goal
>>> >> of -c <N bytes>; so certainly the "-cat" program will not fetch the
>>> >> whole file down but it may fetch a few bytes extra over communication
>>> >> due to use of read buffers (the extra data won't be put into the
>>> target
>>> >> file, and get discarded).
>>> >>
>>> >> We can try it out and observe the "clienttrace" logged
>>> >> at the DN at the end of the -cat's read. Here's an example:
>>> >>
>>> >> I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes"
>>> >> below, its ~1.58 MB:
>>> >>
>>> >> 2013-02-20 23:55:19,777 INFO
>>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>>> >> /127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op:
>>> >> HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,
>>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>>> >> duration: 192289000
>>> >>
>>> >> I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to
>>> >> store first 5 bytes onto a local file:
>>> >>
>>> >> Asserting that post command we get 5 bytes:
>>> >> ➜  ~ wc -c foo.xml
>>> >>        5 foo.xml
>>> >>
>>> >> Asserting that DN didn't IO-read the whole file, see the read op below
>>> >> and its "bytes" parameter, its only about 193 KB, not the whole block
>>> >> of 1.58 MB we wrote earlier:
>>> >>
>>> >> 2013-02-21 00:01:32,437 INFO
>>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>>> >> /127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op:
>>> >> HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,
>>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>>> >> duration: 19207000
>>> >>
>>> >> I don't see how this is anymore dangerous than doing a
>>> >> -copyToLocal/-get, which retrieves the whole file anyway?
>>> >>
>>> >> On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
>>> >> <je...@spaggiari.org> wrote:
>>> >>> But be careful.
>>> >>>
>>> >>> hadoop fs -cat will retrieve the entire file and last only when it
>>> >>> will have retrieve the last bytes you are looking for.
>>> >>>
>>> >>> If your file is many GB big, it will take a lot of time for this
>>> >>> command to complete and will put some pressure on your network.
>>> >>>
>>> >>> JM
>>> >>>
>>> >>> 2013/2/19, jamal sasha <ja...@gmail.com>:
>>> >>>> Awesome thanks :)
>>> >>>>
>>> >>>>
>>> >>>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com>
>>> wrote:
>>> >>>>
>>> >>>>> You can instead use 'fs -cat' and the 'head' coreutil, as one
>>> example:
>>> >>>>>
>>> >>>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>>> >>>>>
>>> >>>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <
>>> jamalshasha@gmail.com>
>>> >>>>> wrote:
>>> >>>>> > Hi,
>>> >>>>> >   I was wondering in the following command:
>>> >>>>> >
>>> >>>>> > bin/hadoop dfs -copyToLocal hdfspath localpath
>>> >>>>> > can we have specify to copy not full but like xMB's of file to
>>> local
>>> >>>>> drive?
>>> >>>>> >
>>> >>>>> > Is something like this possible
>>> >>>>> > Thanks
>>> >>>>> > Jamal
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> --
>>> >>>>> Harsh J
>>> >>>>>
>>> >>>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Harsh J
>>> >>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>

Re: copy chunk of hadoop output

Posted by Azuryy Yu <az...@gmail.com>.

yes,just ignore this log.
On Mar 2, 2013 7:27 AM, "jamal sasha" <ja...@gmail.com> wrote:

> Though it copies.. but it gives this error?
>
>
> On Fri, Mar 1, 2013 at 3:21 PM, jamal sasha <ja...@gmail.com> wrote:
>
>> When I try this.. I get an error
>> cat: Unable to write to output stream.
>>
>> Are these permissions issue
>> How do i resolve this?
>> THanks
>>
>>
>> On Wed, Feb 20, 2013 at 12:21 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> No problem JM, I was confused as well.
>>>
>>> AFAIK, there's no shell utility that can let you specify an offset #
>>> of bytes to start off with (similar to skip in dd?), but that can be
>>> done from the FS API.
>>>
>>> On Thu, Feb 21, 2013 at 1:14 AM, Jean-Marc Spaggiari
>>> <je...@spaggiari.org> wrote:
>>> > Hi Harsh,
>>> >
>>> > My bad.
>>> >
>>> > I read the example quickly and I don't know why I tought you used tail
>>> > and not head.
>>> >
>>> > head will work perfectly. But tail will not since it will need to read
>>> > the entier file. My comment was for tail, not for head, and therefore
>>> > not application to the example you gave.
>>> >
>>> >
>>> > hadoop fs -cat 100-byte-dfs-file | tail -c 5 > 5-byte-local-file
>>> >
>>> > Will have to download the entire file.
>>> >
>>> > Is there a way to "jump" into a certain position in a file and "cat"
>>> from there?
>>> >
>>> > JM
>>> >
>>> > 2013/2/20, Harsh J <ha...@cloudera.com>:
>>> >> Hi JM,
>>> >>
>>> >> I am not sure how "dangerous" it is, since we're using a pipe here,
>>> >> and as you yourself note, it will only last as long as the last bytes
>>> >> have been got and then terminate.
>>> >>
>>> >> The -cat process will terminate because the
>>> >> process we're piping to will terminate first after it reaches its goal
>>> >> of -c <N bytes>; so certainly the "-cat" program will not fetch the
>>> >> whole file down but it may fetch a few bytes extra over communication
>>> >> due to use of read buffers (the extra data won't be put into the
>>> target
>>> >> file, and get discarded).
>>> >>
>>> >> We can try it out and observe the "clienttrace" logged
>>> >> at the DN at the end of the -cat's read. Here's an example:
>>> >>
>>> >> I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes"
>>> >> below, its ~1.58 MB:
>>> >>
>>> >> 2013-02-20 23:55:19,777 INFO
>>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>>> >> /127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op:
>>> >> HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,
>>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>>> >> duration: 192289000
>>> >>
>>> >> I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to
>>> >> store first 5 bytes onto a local file:
>>> >>
>>> >> Asserting that post command we get 5 bytes:
>>> >> ➜  ~ wc -c foo.xml
>>> >>        5 foo.xml
>>> >>
>>> >> Asserting that DN didn't IO-read the whole file, see the read op below
>>> >> and its "bytes" parameter, its only about 193 KB, not the whole block
>>> >> of 1.58 MB we wrote earlier:
>>> >>
>>> >> 2013-02-21 00:01:32,437 INFO
>>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>>> >> /127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op:
>>> >> HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,
>>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>>> >> duration: 19207000
>>> >>
>>> >> I don't see how this is anymore dangerous than doing a
>>> >> -copyToLocal/-get, which retrieves the whole file anyway?
>>> >>
>>> >> On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
>>> >> <je...@spaggiari.org> wrote:
>>> >>> But be careful.
>>> >>>
>>> >>> hadoop fs -cat will retrieve the entire file and last only when it
>>> >>> will have retrieve the last bytes you are looking for.
>>> >>>
>>> >>> If your file is many GB big, it will take a lot of time for this
>>> >>> command to complete and will put some pressure on your network.
>>> >>>
>>> >>> JM
>>> >>>
>>> >>> 2013/2/19, jamal sasha <ja...@gmail.com>:
>>> >>>> Awesome thanks :)
>>> >>>>
>>> >>>>
>>> >>>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com>
>>> wrote:
>>> >>>>
>>> >>>>> You can instead use 'fs -cat' and the 'head' coreutil, as one
>>> example:
>>> >>>>>
>>> >>>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>>> >>>>>
>>> >>>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <
>>> jamalshasha@gmail.com>
>>> >>>>> wrote:
>>> >>>>> > Hi,
>>> >>>>> >   I was wondering in the following command:
>>> >>>>> >
>>> >>>>> > bin/hadoop dfs -copyToLocal hdfspath localpath
>>> >>>>> > can we have specify to copy not full but like xMB's of file to
>>> local
>>> >>>>> drive?
>>> >>>>> >
>>> >>>>> > Is something like this possible
>>> >>>>> > Thanks
>>> >>>>> > Jamal
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> --
>>> >>>>> Harsh J
>>> >>>>>
>>> >>>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Harsh J
>>> >>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>

Re: copy chunk of hadoop output

Posted by jamal sasha <ja...@gmail.com>.

Though it copies.. but it gives this error?


On Fri, Mar 1, 2013 at 3:21 PM, jamal sasha <ja...@gmail.com> wrote:

> When I try this.. I get an error
> cat: Unable to write to output stream.
>
> Are these permissions issue
> How do i resolve this?
> THanks
>
>
> On Wed, Feb 20, 2013 at 12:21 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> No problem JM, I was confused as well.
>>
>> AFAIK, there's no shell utility that can let you specify an offset #
>> of bytes to start off with (similar to skip in dd?), but that can be
>> done from the FS API.
>>
>> On Thu, Feb 21, 2013 at 1:14 AM, Jean-Marc Spaggiari
>> <je...@spaggiari.org> wrote:
>> > Hi Harsh,
>> >
>> > My bad.
>> >
>> > I read the example quickly and I don't know why I tought you used tail
>> > and not head.
>> >
>> > head will work perfectly. But tail will not since it will need to read
>> > the entier file. My comment was for tail, not for head, and therefore
>> > not application to the example you gave.
>> >
>> >
>> > hadoop fs -cat 100-byte-dfs-file | tail -c 5 > 5-byte-local-file
>> >
>> > Will have to download the entire file.
>> >
>> > Is there a way to "jump" into a certain position in a file and "cat"
>> from there?
>> >
>> > JM
>> >
>> > 2013/2/20, Harsh J <ha...@cloudera.com>:
>> >> Hi JM,
>> >>
>> >> I am not sure how "dangerous" it is, since we're using a pipe here,
>> >> and as you yourself note, it will only last as long as the last bytes
>> >> have been got and then terminate.
>> >>
>> >> The -cat process will terminate because the
>> >> process we're piping to will terminate first after it reaches its goal
>> >> of -c <N bytes>; so certainly the "-cat" program will not fetch the
>> >> whole file down but it may fetch a few bytes extra over communication
>> >> due to use of read buffers (the extra data won't be put into the target
>> >> file, and get discarded).
>> >>
>> >> We can try it out and observe the "clienttrace" logged
>> >> at the DN at the end of the -cat's read. Here's an example:
>> >>
>> >> I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes"
>> >> below, its ~1.58 MB:
>> >>
>> >> 2013-02-20 23:55:19,777 INFO
>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> >> /127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op:
>> >> HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,
>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>> >> duration: 192289000
>> >>
>> >> I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to
>> >> store first 5 bytes onto a local file:
>> >>
>> >> Asserting that post command we get 5 bytes:
>> >> ➜  ~ wc -c foo.xml
>> >>        5 foo.xml
>> >>
>> >> Asserting that DN didn't IO-read the whole file, see the read op below
>> >> and its "bytes" parameter, its only about 193 KB, not the whole block
>> >> of 1.58 MB we wrote earlier:
>> >>
>> >> 2013-02-21 00:01:32,437 INFO
>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> >> /127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op:
>> >> HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,
>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>> >> duration: 19207000
>> >>
>> >> I don't see how this is anymore dangerous than doing a
>> >> -copyToLocal/-get, which retrieves the whole file anyway?
>> >>
>> >> On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
>> >> <je...@spaggiari.org> wrote:
>> >>> But be careful.
>> >>>
>> >>> hadoop fs -cat will retrieve the entire file and last only when it
>> >>> will have retrieve the last bytes you are looking for.
>> >>>
>> >>> If your file is many GB big, it will take a lot of time for this
>> >>> command to complete and will put some pressure on your network.
>> >>>
>> >>> JM
>> >>>
>> >>> 2013/2/19, jamal sasha <ja...@gmail.com>:
>> >>>> Awesome thanks :)
>> >>>>
>> >>>>
>> >>>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com> wrote:
>> >>>>
>> >>>>> You can instead use 'fs -cat' and the 'head' coreutil, as one
>> example:
>> >>>>>
>> >>>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>> >>>>>
>> >>>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <jamalshasha@gmail.com
>> >
>> >>>>> wrote:
>> >>>>> > Hi,
>> >>>>> >   I was wondering in the following command:
>> >>>>> >
>> >>>>> > bin/hadoop dfs -copyToLocal hdfspath localpath
>> >>>>> > can we have specify to copy not full but like xMB's of file to
>> local
>> >>>>> drive?
>> >>>>> >
>> >>>>> > Is something like this possible
>> >>>>> > Thanks
>> >>>>> > Jamal
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Harsh J
>> >>>>>
>> >>>>
>> >>
>> >>
>> >>
>> >> --
>> >> Harsh J
>> >>
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: copy chunk of hadoop output

Posted by jamal sasha <ja...@gmail.com>.

Though it copies.. but it gives this error?


On Fri, Mar 1, 2013 at 3:21 PM, jamal sasha <ja...@gmail.com> wrote:

> When I try this.. I get an error
> cat: Unable to write to output stream.
>
> Are these permissions issue
> How do i resolve this?
> THanks
>
>
> On Wed, Feb 20, 2013 at 12:21 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> No problem JM, I was confused as well.
>>
>> AFAIK, there's no shell utility that can let you specify an offset #
>> of bytes to start off with (similar to skip in dd?), but that can be
>> done from the FS API.
>>
>> On Thu, Feb 21, 2013 at 1:14 AM, Jean-Marc Spaggiari
>> <je...@spaggiari.org> wrote:
>> > Hi Harsh,
>> >
>> > My bad.
>> >
>> > I read the example quickly and I don't know why I tought you used tail
>> > and not head.
>> >
>> > head will work perfectly. But tail will not since it will need to read
>> > the entier file. My comment was for tail, not for head, and therefore
>> > not application to the example you gave.
>> >
>> >
>> > hadoop fs -cat 100-byte-dfs-file | tail -c 5 > 5-byte-local-file
>> >
>> > Will have to download the entire file.
>> >
>> > Is there a way to "jump" into a certain position in a file and "cat"
>> from there?
>> >
>> > JM
>> >
>> > 2013/2/20, Harsh J <ha...@cloudera.com>:
>> >> Hi JM,
>> >>
>> >> I am not sure how "dangerous" it is, since we're using a pipe here,
>> >> and as you yourself note, it will only last as long as the last bytes
>> >> have been got and then terminate.
>> >>
>> >> The -cat process will terminate because the
>> >> process we're piping to will terminate first after it reaches its goal
>> >> of -c <N bytes>; so certainly the "-cat" program will not fetch the
>> >> whole file down but it may fetch a few bytes extra over communication
>> >> due to use of read buffers (the extra data won't be put into the target
>> >> file, and get discarded).
>> >>
>> >> We can try it out and observe the "clienttrace" logged
>> >> at the DN at the end of the -cat's read. Here's an example:
>> >>
>> >> I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes"
>> >> below, its ~1.58 MB:
>> >>
>> >> 2013-02-20 23:55:19,777 INFO
>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> >> /127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op:
>> >> HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,
>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>> >> duration: 192289000
>> >>
>> >> I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to
>> >> store first 5 bytes onto a local file:
>> >>
>> >> Asserting that post command we get 5 bytes:
>> >> ➜  ~ wc -c foo.xml
>> >>        5 foo.xml
>> >>
>> >> Asserting that DN didn't IO-read the whole file, see the read op below
>> >> and its "bytes" parameter, its only about 193 KB, not the whole block
>> >> of 1.58 MB we wrote earlier:
>> >>
>> >> 2013-02-21 00:01:32,437 INFO
>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> >> /127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op:
>> >> HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,
>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>> >> duration: 19207000
>> >>
>> >> I don't see how this is anymore dangerous than doing a
>> >> -copyToLocal/-get, which retrieves the whole file anyway?
>> >>
>> >> On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
>> >> <je...@spaggiari.org> wrote:
>> >>> But be careful.
>> >>>
>> >>> hadoop fs -cat will retrieve the entire file and last only when it
>> >>> will have retrieve the last bytes you are looking for.
>> >>>
>> >>> If your file is many GB big, it will take a lot of time for this
>> >>> command to complete and will put some pressure on your network.
>> >>>
>> >>> JM
>> >>>
>> >>> 2013/2/19, jamal sasha <ja...@gmail.com>:
>> >>>> Awesome thanks :)
>> >>>>
>> >>>>
>> >>>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com> wrote:
>> >>>>
>> >>>>> You can instead use 'fs -cat' and the 'head' coreutil, as one
>> example:
>> >>>>>
>> >>>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>> >>>>>
>> >>>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <jamalshasha@gmail.com
>> >
>> >>>>> wrote:
>> >>>>> > Hi,
>> >>>>> >   I was wondering in the following command:
>> >>>>> >
>> >>>>> > bin/hadoop dfs -copyToLocal hdfspath localpath
>> >>>>> > can we have specify to copy not full but like xMB's of file to
>> local
>> >>>>> drive?
>> >>>>> >
>> >>>>> > Is something like this possible
>> >>>>> > Thanks
>> >>>>> > Jamal
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Harsh J
>> >>>>>
>> >>>>
>> >>
>> >>
>> >>
>> >> --
>> >> Harsh J
>> >>
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: copy chunk of hadoop output

Posted by jamal sasha <ja...@gmail.com>.

Though it copies.. but it gives this error?


On Fri, Mar 1, 2013 at 3:21 PM, jamal sasha <ja...@gmail.com> wrote:

> When I try this.. I get an error
> cat: Unable to write to output stream.
>
> Are these permissions issue
> How do i resolve this?
> THanks
>
>
> On Wed, Feb 20, 2013 at 12:21 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> No problem JM, I was confused as well.
>>
>> AFAIK, there's no shell utility that can let you specify an offset #
>> of bytes to start off with (similar to skip in dd?), but that can be
>> done from the FS API.
>>
>> On Thu, Feb 21, 2013 at 1:14 AM, Jean-Marc Spaggiari
>> <je...@spaggiari.org> wrote:
>> > Hi Harsh,
>> >
>> > My bad.
>> >
>> > I read the example quickly and I don't know why I tought you used tail
>> > and not head.
>> >
>> > head will work perfectly. But tail will not since it will need to read
>> > the entier file. My comment was for tail, not for head, and therefore
>> > not application to the example you gave.
>> >
>> >
>> > hadoop fs -cat 100-byte-dfs-file | tail -c 5 > 5-byte-local-file
>> >
>> > Will have to download the entire file.
>> >
>> > Is there a way to "jump" into a certain position in a file and "cat"
>> from there?
>> >
>> > JM
>> >
>> > 2013/2/20, Harsh J <ha...@cloudera.com>:
>> >> Hi JM,
>> >>
>> >> I am not sure how "dangerous" it is, since we're using a pipe here,
>> >> and as you yourself note, it will only last as long as the last bytes
>> >> have been got and then terminate.
>> >>
>> >> The -cat process will terminate because the
>> >> process we're piping to will terminate first after it reaches its goal
>> >> of -c <N bytes>; so certainly the "-cat" program will not fetch the
>> >> whole file down but it may fetch a few bytes extra over communication
>> >> due to use of read buffers (the extra data won't be put into the target
>> >> file, and get discarded).
>> >>
>> >> We can try it out and observe the "clienttrace" logged
>> >> at the DN at the end of the -cat's read. Here's an example:
>> >>
>> >> I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes"
>> >> below, its ~1.58 MB:
>> >>
>> >> 2013-02-20 23:55:19,777 INFO
>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> >> /127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op:
>> >> HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,
>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>> >> duration: 192289000
>> >>
>> >> I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to
>> >> store first 5 bytes onto a local file:
>> >>
>> >> Asserting that post command we get 5 bytes:
>> >> ➜  ~ wc -c foo.xml
>> >>        5 foo.xml
>> >>
>> >> Asserting that DN didn't IO-read the whole file, see the read op below
>> >> and its "bytes" parameter, its only about 193 KB, not the whole block
>> >> of 1.58 MB we wrote earlier:
>> >>
>> >> 2013-02-21 00:01:32,437 INFO
>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> >> /127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op:
>> >> HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,
>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>> >> duration: 19207000
>> >>
>> >> I don't see how this is anymore dangerous than doing a
>> >> -copyToLocal/-get, which retrieves the whole file anyway?
>> >>
>> >> On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
>> >> <je...@spaggiari.org> wrote:
>> >>> But be careful.
>> >>>
>> >>> hadoop fs -cat will retrieve the entire file and last only when it
>> >>> will have retrieve the last bytes you are looking for.
>> >>>
>> >>> If your file is many GB big, it will take a lot of time for this
>> >>> command to complete and will put some pressure on your network.
>> >>>
>> >>> JM
>> >>>
>> >>> 2013/2/19, jamal sasha <ja...@gmail.com>:
>> >>>> Awesome thanks :)
>> >>>>
>> >>>>
>> >>>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com> wrote:
>> >>>>
>> >>>>> You can instead use 'fs -cat' and the 'head' coreutil, as one
>> example:
>> >>>>>
>> >>>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>> >>>>>
>> >>>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <jamalshasha@gmail.com
>> >
>> >>>>> wrote:
>> >>>>> > Hi,
>> >>>>> >   I was wondering in the following command:
>> >>>>> >
>> >>>>> > bin/hadoop dfs -copyToLocal hdfspath localpath
>> >>>>> > can we have specify to copy not full but like xMB's of file to
>> local
>> >>>>> drive?
>> >>>>> >
>> >>>>> > Is something like this possible
>> >>>>> > Thanks
>> >>>>> > Jamal
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Harsh J
>> >>>>>
>> >>>>
>> >>
>> >>
>> >>
>> >> --
>> >> Harsh J
>> >>
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: copy chunk of hadoop output

Posted by jamal sasha <ja...@gmail.com>.

Though it copies.. but it gives this error?


On Fri, Mar 1, 2013 at 3:21 PM, jamal sasha <ja...@gmail.com> wrote:

> When I try this.. I get an error
> cat: Unable to write to output stream.
>
> Are these permissions issue
> How do i resolve this?
> THanks
>
>
> On Wed, Feb 20, 2013 at 12:21 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> No problem JM, I was confused as well.
>>
>> AFAIK, there's no shell utility that can let you specify an offset #
>> of bytes to start off with (similar to skip in dd?), but that can be
>> done from the FS API.
>>
>> On Thu, Feb 21, 2013 at 1:14 AM, Jean-Marc Spaggiari
>> <je...@spaggiari.org> wrote:
>> > Hi Harsh,
>> >
>> > My bad.
>> >
>> > I read the example quickly and I don't know why I tought you used tail
>> > and not head.
>> >
>> > head will work perfectly. But tail will not since it will need to read
>> > the entier file. My comment was for tail, not for head, and therefore
>> > not application to the example you gave.
>> >
>> >
>> > hadoop fs -cat 100-byte-dfs-file | tail -c 5 > 5-byte-local-file
>> >
>> > Will have to download the entire file.
>> >
>> > Is there a way to "jump" into a certain position in a file and "cat"
>> from there?
>> >
>> > JM
>> >
>> > 2013/2/20, Harsh J <ha...@cloudera.com>:
>> >> Hi JM,
>> >>
>> >> I am not sure how "dangerous" it is, since we're using a pipe here,
>> >> and as you yourself note, it will only last as long as the last bytes
>> >> have been got and then terminate.
>> >>
>> >> The -cat process will terminate because the
>> >> process we're piping to will terminate first after it reaches its goal
>> >> of -c <N bytes>; so certainly the "-cat" program will not fetch the
>> >> whole file down but it may fetch a few bytes extra over communication
>> >> due to use of read buffers (the extra data won't be put into the target
>> >> file, and get discarded).
>> >>
>> >> We can try it out and observe the "clienttrace" logged
>> >> at the DN at the end of the -cat's read. Here's an example:
>> >>
>> >> I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes"
>> >> below, its ~1.58 MB:
>> >>
>> >> 2013-02-20 23:55:19,777 INFO
>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> >> /127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op:
>> >> HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,
>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>> >> duration: 192289000
>> >>
>> >> I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to
>> >> store first 5 bytes onto a local file:
>> >>
>> >> Asserting that post command we get 5 bytes:
>> >> ➜  ~ wc -c foo.xml
>> >>        5 foo.xml
>> >>
>> >> Asserting that DN didn't IO-read the whole file, see the read op below
>> >> and its "bytes" parameter, its only about 193 KB, not the whole block
>> >> of 1.58 MB we wrote earlier:
>> >>
>> >> 2013-02-21 00:01:32,437 INFO
>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> >> /127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op:
>> >> HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,
>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>> >> duration: 19207000
>> >>
>> >> I don't see how this is anymore dangerous than doing a
>> >> -copyToLocal/-get, which retrieves the whole file anyway?
>> >>
>> >> On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
>> >> <je...@spaggiari.org> wrote:
>> >>> But be careful.
>> >>>
>> >>> hadoop fs -cat will retrieve the entire file and last only when it
>> >>> will have retrieve the last bytes you are looking for.
>> >>>
>> >>> If your file is many GB big, it will take a lot of time for this
>> >>> command to complete and will put some pressure on your network.
>> >>>
>> >>> JM
>> >>>
>> >>> 2013/2/19, jamal sasha <ja...@gmail.com>:
>> >>>> Awesome thanks :)
>> >>>>
>> >>>>
>> >>>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com> wrote:
>> >>>>
>> >>>>> You can instead use 'fs -cat' and the 'head' coreutil, as one
>> example:
>> >>>>>
>> >>>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>> >>>>>
>> >>>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <jamalshasha@gmail.com
>> >
>> >>>>> wrote:
>> >>>>> > Hi,
>> >>>>> >   I was wondering in the following command:
>> >>>>> >
>> >>>>> > bin/hadoop dfs -copyToLocal hdfspath localpath
>> >>>>> > can we have specify to copy not full but like xMB's of file to
>> local
>> >>>>> drive?
>> >>>>> >
>> >>>>> > Is something like this possible
>> >>>>> > Thanks
>> >>>>> > Jamal
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Harsh J
>> >>>>>
>> >>>>
>> >>
>> >>
>> >>
>> >> --
>> >> Harsh J
>> >>
>>
>>
>>
>> --
>> Harsh J
>>
>
>