You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by jamal sasha <ja...@gmail.com> on 2013/02/19 23:08:39 UTC

copy chunk of hadoop output

Hi,
  I was wondering in the following command:

bin/hadoop dfs -copyToLocal hdfspath localpath
can we have specify to copy not full but like xMB's of file to local drive?

Is something like this possible
Thanks
Jamal

Re: copy chunk of hadoop output

Posted by Azuryy Yu <az...@gmail.com>.

yes,just ignore this log.
On Mar 2, 2013 7:27 AM, "jamal sasha" <ja...@gmail.com> wrote:

> Though it copies.. but it gives this error?
>
>
> On Fri, Mar 1, 2013 at 3:21 PM, jamal sasha <ja...@gmail.com> wrote:
>
>> When I try this.. I get an error
>> cat: Unable to write to output stream.
>>
>> Are these permissions issue
>> How do i resolve this?
>> THanks
>>
>>
>> On Wed, Feb 20, 2013 at 12:21 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> No problem JM, I was confused as well.
>>>
>>> AFAIK, there's no shell utility that can let you specify an offset #
>>> of bytes to start off with (similar to skip in dd?), but that can be
>>> done from the FS API.
>>>
>>> On Thu, Feb 21, 2013 at 1:14 AM, Jean-Marc Spaggiari
>>> <je...@spaggiari.org> wrote:
>>> > Hi Harsh,
>>> >
>>> > My bad.
>>> >
>>> > I read the example quickly and I don't know why I tought you used tail
>>> > and not head.
>>> >
>>> > head will work perfectly. But tail will not since it will need to read
>>> > the entier file. My comment was for tail, not for head, and therefore
>>> > not application to the example you gave.
>>> >
>>> >
>>> > hadoop fs -cat 100-byte-dfs-file | tail -c 5 > 5-byte-local-file
>>> >
>>> > Will have to download the entire file.
>>> >
>>> > Is there a way to "jump" into a certain position in a file and "cat"
>>> from there?
>>> >
>>> > JM
>>> >
>>> > 2013/2/20, Harsh J <ha...@cloudera.com>:
>>> >> Hi JM,
>>> >>
>>> >> I am not sure how "dangerous" it is, since we're using a pipe here,
>>> >> and as you yourself note, it will only last as long as the last bytes
>>> >> have been got and then terminate.
>>> >>
>>> >> The -cat process will terminate because the
>>> >> process we're piping to will terminate first after it reaches its goal
>>> >> of -c <N bytes>; so certainly the "-cat" program will not fetch the
>>> >> whole file down but it may fetch a few bytes extra over communication
>>> >> due to use of read buffers (the extra data won't be put into the
>>> target
>>> >> file, and get discarded).
>>> >>
>>> >> We can try it out and observe the "clienttrace" logged
>>> >> at the DN at the end of the -cat's read. Here's an example:
>>> >>
>>> >> I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes"
>>> >> below, its ~1.58 MB:
>>> >>
>>> >> 2013-02-20 23:55:19,777 INFO
>>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>>> >> /127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op:
>>> >> HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,
>>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>>> >> duration: 192289000
>>> >>
>>> >> I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to
>>> >> store first 5 bytes onto a local file:
>>> >>
>>> >> Asserting that post command we get 5 bytes:
>>> >> ➜  ~ wc -c foo.xml
>>> >>        5 foo.xml
>>> >>
>>> >> Asserting that DN didn't IO-read the whole file, see the read op below
>>> >> and its "bytes" parameter, its only about 193 KB, not the whole block
>>> >> of 1.58 MB we wrote earlier:
>>> >>
>>> >> 2013-02-21 00:01:32,437 INFO
>>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>>> >> /127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op:
>>> >> HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,
>>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>>> >> duration: 19207000
>>> >>
>>> >> I don't see how this is anymore dangerous than doing a
>>> >> -copyToLocal/-get, which retrieves the whole file anyway?
>>> >>
>>> >> On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
>>> >> <je...@spaggiari.org> wrote:
>>> >>> But be careful.
>>> >>>
>>> >>> hadoop fs -cat will retrieve the entire file and last only when it
>>> >>> will have retrieve the last bytes you are looking for.
>>> >>>
>>> >>> If your file is many GB big, it will take a lot of time for this
>>> >>> command to complete and will put some pressure on your network.
>>> >>>
>>> >>> JM
>>> >>>
>>> >>> 2013/2/19, jamal sasha <ja...@gmail.com>:
>>> >>>> Awesome thanks :)
>>> >>>>
>>> >>>>
>>> >>>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com>
>>> wrote:
>>> >>>>
>>> >>>>> You can instead use 'fs -cat' and the 'head' coreutil, as one
>>> example:
>>> >>>>>
>>> >>>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>>> >>>>>
>>> >>>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <
>>> jamalshasha@gmail.com>
>>> >>>>> wrote:
>>> >>>>> > Hi,
>>> >>>>> >   I was wondering in the following command:
>>> >>>>> >
>>> >>>>> > bin/hadoop dfs -copyToLocal hdfspath localpath
>>> >>>>> > can we have specify to copy not full but like xMB's of file to
>>> local
>>> >>>>> drive?
>>> >>>>> >
>>> >>>>> > Is something like this possible
>>> >>>>> > Thanks
>>> >>>>> > Jamal
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> --
>>> >>>>> Harsh J
>>> >>>>>
>>> >>>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Harsh J
>>> >>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>

Re: copy chunk of hadoop output

Posted by Azuryy Yu <az...@gmail.com>.

yes,just ignore this log.
On Mar 2, 2013 7:27 AM, "jamal sasha" <ja...@gmail.com> wrote:

> Though it copies.. but it gives this error?
>
>
> On Fri, Mar 1, 2013 at 3:21 PM, jamal sasha <ja...@gmail.com> wrote:
>
>> When I try this.. I get an error
>> cat: Unable to write to output stream.
>>
>> Are these permissions issue
>> How do i resolve this?
>> THanks
>>
>>
>> On Wed, Feb 20, 2013 at 12:21 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> No problem JM, I was confused as well.
>>>
>>> AFAIK, there's no shell utility that can let you specify an offset #
>>> of bytes to start off with (similar to skip in dd?), but that can be
>>> done from the FS API.
>>>
>>> On Thu, Feb 21, 2013 at 1:14 AM, Jean-Marc Spaggiari
>>> <je...@spaggiari.org> wrote:
>>> > Hi Harsh,
>>> >
>>> > My bad.
>>> >
>>> > I read the example quickly and I don't know why I tought you used tail
>>> > and not head.
>>> >
>>> > head will work perfectly. But tail will not since it will need to read
>>> > the entier file. My comment was for tail, not for head, and therefore
>>> > not application to the example you gave.
>>> >
>>> >
>>> > hadoop fs -cat 100-byte-dfs-file | tail -c 5 > 5-byte-local-file
>>> >
>>> > Will have to download the entire file.
>>> >
>>> > Is there a way to "jump" into a certain position in a file and "cat"
>>> from there?
>>> >
>>> > JM
>>> >
>>> > 2013/2/20, Harsh J <ha...@cloudera.com>:
>>> >> Hi JM,
>>> >>
>>> >> I am not sure how "dangerous" it is, since we're using a pipe here,
>>> >> and as you yourself note, it will only last as long as the last bytes
>>> >> have been got and then terminate.
>>> >>
>>> >> The -cat process will terminate because the
>>> >> process we're piping to will terminate first after it reaches its goal
>>> >> of -c <N bytes>; so certainly the "-cat" program will not fetch the
>>> >> whole file down but it may fetch a few bytes extra over communication
>>> >> due to use of read buffers (the extra data won't be put into the
>>> target
>>> >> file, and get discarded).
>>> >>
>>> >> We can try it out and observe the "clienttrace" logged
>>> >> at the DN at the end of the -cat's read. Here's an example:
>>> >>
>>> >> I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes"
>>> >> below, its ~1.58 MB:
>>> >>
>>> >> 2013-02-20 23:55:19,777 INFO
>>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>>> >> /127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op:
>>> >> HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,
>>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>>> >> duration: 192289000
>>> >>
>>> >> I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to
>>> >> store first 5 bytes onto a local file:
>>> >>
>>> >> Asserting that post command we get 5 bytes:
>>> >> ➜  ~ wc -c foo.xml
>>> >>        5 foo.xml
>>> >>
>>> >> Asserting that DN didn't IO-read the whole file, see the read op below
>>> >> and its "bytes" parameter, its only about 193 KB, not the whole block
>>> >> of 1.58 MB we wrote earlier:
>>> >>
>>> >> 2013-02-21 00:01:32,437 INFO
>>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>>> >> /127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op:
>>> >> HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,
>>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>>> >> duration: 19207000
>>> >>
>>> >> I don't see how this is anymore dangerous than doing a
>>> >> -copyToLocal/-get, which retrieves the whole file anyway?
>>> >>
>>> >> On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
>>> >> <je...@spaggiari.org> wrote:
>>> >>> But be careful.
>>> >>>
>>> >>> hadoop fs -cat will retrieve the entire file and last only when it
>>> >>> will have retrieve the last bytes you are looking for.
>>> >>>
>>> >>> If your file is many GB big, it will take a lot of time for this
>>> >>> command to complete and will put some pressure on your network.
>>> >>>
>>> >>> JM
>>> >>>
>>> >>> 2013/2/19, jamal sasha <ja...@gmail.com>:
>>> >>>> Awesome thanks :)
>>> >>>>
>>> >>>>
>>> >>>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com>
>>> wrote:
>>> >>>>
>>> >>>>> You can instead use 'fs -cat' and the 'head' coreutil, as one
>>> example:
>>> >>>>>
>>> >>>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>>> >>>>>
>>> >>>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <
>>> jamalshasha@gmail.com>
>>> >>>>> wrote:
>>> >>>>> > Hi,
>>> >>>>> >   I was wondering in the following command:
>>> >>>>> >
>>> >>>>> > bin/hadoop dfs -copyToLocal hdfspath localpath
>>> >>>>> > can we have specify to copy not full but like xMB's of file to
>>> local
>>> >>>>> drive?
>>> >>>>> >
>>> >>>>> > Is something like this possible
>>> >>>>> > Thanks
>>> >>>>> > Jamal
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> --
>>> >>>>> Harsh J
>>> >>>>>
>>> >>>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Harsh J
>>> >>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>

Re: copy chunk of hadoop output

Posted by Azuryy Yu <az...@gmail.com>.

yes,just ignore this log.
On Mar 2, 2013 7:27 AM, "jamal sasha" <ja...@gmail.com> wrote:

> Though it copies.. but it gives this error?
>
>
> On Fri, Mar 1, 2013 at 3:21 PM, jamal sasha <ja...@gmail.com> wrote:
>
>> When I try this.. I get an error
>> cat: Unable to write to output stream.
>>
>> Are these permissions issue
>> How do i resolve this?
>> THanks
>>
>>
>> On Wed, Feb 20, 2013 at 12:21 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> No problem JM, I was confused as well.
>>>
>>> AFAIK, there's no shell utility that can let you specify an offset #
>>> of bytes to start off with (similar to skip in dd?), but that can be
>>> done from the FS API.
>>>
>>> On Thu, Feb 21, 2013 at 1:14 AM, Jean-Marc Spaggiari
>>> <je...@spaggiari.org> wrote:
>>> > Hi Harsh,
>>> >
>>> > My bad.
>>> >
>>> > I read the example quickly and I don't know why I tought you used tail
>>> > and not head.
>>> >
>>> > head will work perfectly. But tail will not since it will need to read
>>> > the entier file. My comment was for tail, not for head, and therefore
>>> > not application to the example you gave.
>>> >
>>> >
>>> > hadoop fs -cat 100-byte-dfs-file | tail -c 5 > 5-byte-local-file
>>> >
>>> > Will have to download the entire file.
>>> >
>>> > Is there a way to "jump" into a certain position in a file and "cat"
>>> from there?
>>> >
>>> > JM
>>> >
>>> > 2013/2/20, Harsh J <ha...@cloudera.com>:
>>> >> Hi JM,
>>> >>
>>> >> I am not sure how "dangerous" it is, since we're using a pipe here,
>>> >> and as you yourself note, it will only last as long as the last bytes
>>> >> have been got and then terminate.
>>> >>
>>> >> The -cat process will terminate because the
>>> >> process we're piping to will terminate first after it reaches its goal
>>> >> of -c <N bytes>; so certainly the "-cat" program will not fetch the
>>> >> whole file down but it may fetch a few bytes extra over communication
>>> >> due to use of read buffers (the extra data won't be put into the
>>> target
>>> >> file, and get discarded).
>>> >>
>>> >> We can try it out and observe the "clienttrace" logged
>>> >> at the DN at the end of the -cat's read. Here's an example:
>>> >>
>>> >> I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes"
>>> >> below, its ~1.58 MB:
>>> >>
>>> >> 2013-02-20 23:55:19,777 INFO
>>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>>> >> /127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op:
>>> >> HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,
>>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>>> >> duration: 192289000
>>> >>
>>> >> I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to
>>> >> store first 5 bytes onto a local file:
>>> >>
>>> >> Asserting that post command we get 5 bytes:
>>> >> ➜  ~ wc -c foo.xml
>>> >>        5 foo.xml
>>> >>
>>> >> Asserting that DN didn't IO-read the whole file, see the read op below
>>> >> and its "bytes" parameter, its only about 193 KB, not the whole block
>>> >> of 1.58 MB we wrote earlier:
>>> >>
>>> >> 2013-02-21 00:01:32,437 INFO
>>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>>> >> /127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op:
>>> >> HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,
>>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>>> >> duration: 19207000
>>> >>
>>> >> I don't see how this is anymore dangerous than doing a
>>> >> -copyToLocal/-get, which retrieves the whole file anyway?
>>> >>
>>> >> On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
>>> >> <je...@spaggiari.org> wrote:
>>> >>> But be careful.
>>> >>>
>>> >>> hadoop fs -cat will retrieve the entire file and last only when it
>>> >>> will have retrieve the last bytes you are looking for.
>>> >>>
>>> >>> If your file is many GB big, it will take a lot of time for this
>>> >>> command to complete and will put some pressure on your network.
>>> >>>
>>> >>> JM
>>> >>>
>>> >>> 2013/2/19, jamal sasha <ja...@gmail.com>:
>>> >>>> Awesome thanks :)
>>> >>>>
>>> >>>>
>>> >>>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com>
>>> wrote:
>>> >>>>
>>> >>>>> You can instead use 'fs -cat' and the 'head' coreutil, as one
>>> example:
>>> >>>>>
>>> >>>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>>> >>>>>
>>> >>>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <
>>> jamalshasha@gmail.com>
>>> >>>>> wrote:
>>> >>>>> > Hi,
>>> >>>>> >   I was wondering in the following command:
>>> >>>>> >
>>> >>>>> > bin/hadoop dfs -copyToLocal hdfspath localpath
>>> >>>>> > can we have specify to copy not full but like xMB's of file to
>>> local
>>> >>>>> drive?
>>> >>>>> >
>>> >>>>> > Is something like this possible
>>> >>>>> > Thanks
>>> >>>>> > Jamal
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> --
>>> >>>>> Harsh J
>>> >>>>>
>>> >>>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Harsh J
>>> >>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>

Re: copy chunk of hadoop output

Posted by Azuryy Yu <az...@gmail.com>.

yes,just ignore this log.
On Mar 2, 2013 7:27 AM, "jamal sasha" <ja...@gmail.com> wrote:

> Though it copies.. but it gives this error?
>
>
> On Fri, Mar 1, 2013 at 3:21 PM, jamal sasha <ja...@gmail.com> wrote:
>
>> When I try this.. I get an error
>> cat: Unable to write to output stream.
>>
>> Are these permissions issue
>> How do i resolve this?
>> THanks
>>
>>
>> On Wed, Feb 20, 2013 at 12:21 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> No problem JM, I was confused as well.
>>>
>>> AFAIK, there's no shell utility that can let you specify an offset #
>>> of bytes to start off with (similar to skip in dd?), but that can be
>>> done from the FS API.
>>>
>>> On Thu, Feb 21, 2013 at 1:14 AM, Jean-Marc Spaggiari
>>> <je...@spaggiari.org> wrote:
>>> > Hi Harsh,
>>> >
>>> > My bad.
>>> >
>>> > I read the example quickly and I don't know why I tought you used tail
>>> > and not head.
>>> >
>>> > head will work perfectly. But tail will not since it will need to read
>>> > the entier file. My comment was for tail, not for head, and therefore
>>> > not application to the example you gave.
>>> >
>>> >
>>> > hadoop fs -cat 100-byte-dfs-file | tail -c 5 > 5-byte-local-file
>>> >
>>> > Will have to download the entire file.
>>> >
>>> > Is there a way to "jump" into a certain position in a file and "cat"
>>> from there?
>>> >
>>> > JM
>>> >
>>> > 2013/2/20, Harsh J <ha...@cloudera.com>:
>>> >> Hi JM,
>>> >>
>>> >> I am not sure how "dangerous" it is, since we're using a pipe here,
>>> >> and as you yourself note, it will only last as long as the last bytes
>>> >> have been got and then terminate.
>>> >>
>>> >> The -cat process will terminate because the
>>> >> process we're piping to will terminate first after it reaches its goal
>>> >> of -c <N bytes>; so certainly the "-cat" program will not fetch the
>>> >> whole file down but it may fetch a few bytes extra over communication
>>> >> due to use of read buffers (the extra data won't be put into the
>>> target
>>> >> file, and get discarded).
>>> >>
>>> >> We can try it out and observe the "clienttrace" logged
>>> >> at the DN at the end of the -cat's read. Here's an example:
>>> >>
>>> >> I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes"
>>> >> below, its ~1.58 MB:
>>> >>
>>> >> 2013-02-20 23:55:19,777 INFO
>>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>>> >> /127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op:
>>> >> HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,
>>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>>> >> duration: 192289000
>>> >>
>>> >> I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to
>>> >> store first 5 bytes onto a local file:
>>> >>
>>> >> Asserting that post command we get 5 bytes:
>>> >> ➜  ~ wc -c foo.xml
>>> >>        5 foo.xml
>>> >>
>>> >> Asserting that DN didn't IO-read the whole file, see the read op below
>>> >> and its "bytes" parameter, its only about 193 KB, not the whole block
>>> >> of 1.58 MB we wrote earlier:
>>> >>
>>> >> 2013-02-21 00:01:32,437 INFO
>>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>>> >> /127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op:
>>> >> HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,
>>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>>> >> duration: 19207000
>>> >>
>>> >> I don't see how this is anymore dangerous than doing a
>>> >> -copyToLocal/-get, which retrieves the whole file anyway?
>>> >>
>>> >> On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
>>> >> <je...@spaggiari.org> wrote:
>>> >>> But be careful.
>>> >>>
>>> >>> hadoop fs -cat will retrieve the entire file and last only when it
>>> >>> will have retrieve the last bytes you are looking for.
>>> >>>
>>> >>> If your file is many GB big, it will take a lot of time for this
>>> >>> command to complete and will put some pressure on your network.
>>> >>>
>>> >>> JM
>>> >>>
>>> >>> 2013/2/19, jamal sasha <ja...@gmail.com>:
>>> >>>> Awesome thanks :)
>>> >>>>
>>> >>>>
>>> >>>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com>
>>> wrote:
>>> >>>>
>>> >>>>> You can instead use 'fs -cat' and the 'head' coreutil, as one
>>> example:
>>> >>>>>
>>> >>>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>>> >>>>>
>>> >>>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <
>>> jamalshasha@gmail.com>
>>> >>>>> wrote:
>>> >>>>> > Hi,
>>> >>>>> >   I was wondering in the following command:
>>> >>>>> >
>>> >>>>> > bin/hadoop dfs -copyToLocal hdfspath localpath
>>> >>>>> > can we have specify to copy not full but like xMB's of file to
>>> local
>>> >>>>> drive?
>>> >>>>> >
>>> >>>>> > Is something like this possible
>>> >>>>> > Thanks
>>> >>>>> > Jamal
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> --
>>> >>>>> Harsh J
>>> >>>>>
>>> >>>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Harsh J
>>> >>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>

Re: copy chunk of hadoop output

Posted by jamal sasha <ja...@gmail.com>.

Though it copies.. but it gives this error?


On Fri, Mar 1, 2013 at 3:21 PM, jamal sasha <ja...@gmail.com> wrote:

> When I try this.. I get an error
> cat: Unable to write to output stream.
>
> Are these permissions issue
> How do i resolve this?
> THanks
>
>
> On Wed, Feb 20, 2013 at 12:21 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> No problem JM, I was confused as well.
>>
>> AFAIK, there's no shell utility that can let you specify an offset #
>> of bytes to start off with (similar to skip in dd?), but that can be
>> done from the FS API.
>>
>> On Thu, Feb 21, 2013 at 1:14 AM, Jean-Marc Spaggiari
>> <je...@spaggiari.org> wrote:
>> > Hi Harsh,
>> >
>> > My bad.
>> >
>> > I read the example quickly and I don't know why I tought you used tail
>> > and not head.
>> >
>> > head will work perfectly. But tail will not since it will need to read
>> > the entier file. My comment was for tail, not for head, and therefore
>> > not application to the example you gave.
>> >
>> >
>> > hadoop fs -cat 100-byte-dfs-file | tail -c 5 > 5-byte-local-file
>> >
>> > Will have to download the entire file.
>> >
>> > Is there a way to "jump" into a certain position in a file and "cat"
>> from there?
>> >
>> > JM
>> >
>> > 2013/2/20, Harsh J <ha...@cloudera.com>:
>> >> Hi JM,
>> >>
>> >> I am not sure how "dangerous" it is, since we're using a pipe here,
>> >> and as you yourself note, it will only last as long as the last bytes
>> >> have been got and then terminate.
>> >>
>> >> The -cat process will terminate because the
>> >> process we're piping to will terminate first after it reaches its goal
>> >> of -c <N bytes>; so certainly the "-cat" program will not fetch the
>> >> whole file down but it may fetch a few bytes extra over communication
>> >> due to use of read buffers (the extra data won't be put into the target
>> >> file, and get discarded).
>> >>
>> >> We can try it out and observe the "clienttrace" logged
>> >> at the DN at the end of the -cat's read. Here's an example:
>> >>
>> >> I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes"
>> >> below, its ~1.58 MB:
>> >>
>> >> 2013-02-20 23:55:19,777 INFO
>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> >> /127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op:
>> >> HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,
>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>> >> duration: 192289000
>> >>
>> >> I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to
>> >> store first 5 bytes onto a local file:
>> >>
>> >> Asserting that post command we get 5 bytes:
>> >> ➜  ~ wc -c foo.xml
>> >>        5 foo.xml
>> >>
>> >> Asserting that DN didn't IO-read the whole file, see the read op below
>> >> and its "bytes" parameter, its only about 193 KB, not the whole block
>> >> of 1.58 MB we wrote earlier:
>> >>
>> >> 2013-02-21 00:01:32,437 INFO
>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> >> /127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op:
>> >> HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,
>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>> >> duration: 19207000
>> >>
>> >> I don't see how this is anymore dangerous than doing a
>> >> -copyToLocal/-get, which retrieves the whole file anyway?
>> >>
>> >> On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
>> >> <je...@spaggiari.org> wrote:
>> >>> But be careful.
>> >>>
>> >>> hadoop fs -cat will retrieve the entire file and last only when it
>> >>> will have retrieve the last bytes you are looking for.
>> >>>
>> >>> If your file is many GB big, it will take a lot of time for this
>> >>> command to complete and will put some pressure on your network.
>> >>>
>> >>> JM
>> >>>
>> >>> 2013/2/19, jamal sasha <ja...@gmail.com>:
>> >>>> Awesome thanks :)
>> >>>>
>> >>>>
>> >>>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com> wrote:
>> >>>>
>> >>>>> You can instead use 'fs -cat' and the 'head' coreutil, as one
>> example:
>> >>>>>
>> >>>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>> >>>>>
>> >>>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <jamalshasha@gmail.com
>> >
>> >>>>> wrote:
>> >>>>> > Hi,
>> >>>>> >   I was wondering in the following command:
>> >>>>> >
>> >>>>> > bin/hadoop dfs -copyToLocal hdfspath localpath
>> >>>>> > can we have specify to copy not full but like xMB's of file to
>> local
>> >>>>> drive?
>> >>>>> >
>> >>>>> > Is something like this possible
>> >>>>> > Thanks
>> >>>>> > Jamal
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Harsh J
>> >>>>>
>> >>>>
>> >>
>> >>
>> >>
>> >> --
>> >> Harsh J
>> >>
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: copy chunk of hadoop output

Posted by jamal sasha <ja...@gmail.com>.

Though it copies.. but it gives this error?


On Fri, Mar 1, 2013 at 3:21 PM, jamal sasha <ja...@gmail.com> wrote:

> When I try this.. I get an error
> cat: Unable to write to output stream.
>
> Are these permissions issue
> How do i resolve this?
> THanks
>
>
> On Wed, Feb 20, 2013 at 12:21 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> No problem JM, I was confused as well.
>>
>> AFAIK, there's no shell utility that can let you specify an offset #
>> of bytes to start off with (similar to skip in dd?), but that can be
>> done from the FS API.
>>
>> On Thu, Feb 21, 2013 at 1:14 AM, Jean-Marc Spaggiari
>> <je...@spaggiari.org> wrote:
>> > Hi Harsh,
>> >
>> > My bad.
>> >
>> > I read the example quickly and I don't know why I tought you used tail
>> > and not head.
>> >
>> > head will work perfectly. But tail will not since it will need to read
>> > the entier file. My comment was for tail, not for head, and therefore
>> > not application to the example you gave.
>> >
>> >
>> > hadoop fs -cat 100-byte-dfs-file | tail -c 5 > 5-byte-local-file
>> >
>> > Will have to download the entire file.
>> >
>> > Is there a way to "jump" into a certain position in a file and "cat"
>> from there?
>> >
>> > JM
>> >
>> > 2013/2/20, Harsh J <ha...@cloudera.com>:
>> >> Hi JM,
>> >>
>> >> I am not sure how "dangerous" it is, since we're using a pipe here,
>> >> and as you yourself note, it will only last as long as the last bytes
>> >> have been got and then terminate.
>> >>
>> >> The -cat process will terminate because the
>> >> process we're piping to will terminate first after it reaches its goal
>> >> of -c <N bytes>; so certainly the "-cat" program will not fetch the
>> >> whole file down but it may fetch a few bytes extra over communication
>> >> due to use of read buffers (the extra data won't be put into the target
>> >> file, and get discarded).
>> >>
>> >> We can try it out and observe the "clienttrace" logged
>> >> at the DN at the end of the -cat's read. Here's an example:
>> >>
>> >> I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes"
>> >> below, its ~1.58 MB:
>> >>
>> >> 2013-02-20 23:55:19,777 INFO
>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> >> /127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op:
>> >> HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,
>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>> >> duration: 192289000
>> >>
>> >> I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to
>> >> store first 5 bytes onto a local file:
>> >>
>> >> Asserting that post command we get 5 bytes:
>> >> ➜  ~ wc -c foo.xml
>> >>        5 foo.xml
>> >>
>> >> Asserting that DN didn't IO-read the whole file, see the read op below
>> >> and its "bytes" parameter, its only about 193 KB, not the whole block
>> >> of 1.58 MB we wrote earlier:
>> >>
>> >> 2013-02-21 00:01:32,437 INFO
>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> >> /127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op:
>> >> HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,
>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>> >> duration: 19207000
>> >>
>> >> I don't see how this is anymore dangerous than doing a
>> >> -copyToLocal/-get, which retrieves the whole file anyway?
>> >>
>> >> On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
>> >> <je...@spaggiari.org> wrote:
>> >>> But be careful.
>> >>>
>> >>> hadoop fs -cat will retrieve the entire file and last only when it
>> >>> will have retrieve the last bytes you are looking for.
>> >>>
>> >>> If your file is many GB big, it will take a lot of time for this
>> >>> command to complete and will put some pressure on your network.
>> >>>
>> >>> JM
>> >>>
>> >>> 2013/2/19, jamal sasha <ja...@gmail.com>:
>> >>>> Awesome thanks :)
>> >>>>
>> >>>>
>> >>>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com> wrote:
>> >>>>
>> >>>>> You can instead use 'fs -cat' and the 'head' coreutil, as one
>> example:
>> >>>>>
>> >>>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>> >>>>>
>> >>>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <jamalshasha@gmail.com
>> >
>> >>>>> wrote:
>> >>>>> > Hi,
>> >>>>> >   I was wondering in the following command:
>> >>>>> >
>> >>>>> > bin/hadoop dfs -copyToLocal hdfspath localpath
>> >>>>> > can we have specify to copy not full but like xMB's of file to
>> local
>> >>>>> drive?
>> >>>>> >
>> >>>>> > Is something like this possible
>> >>>>> > Thanks
>> >>>>> > Jamal
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Harsh J
>> >>>>>
>> >>>>
>> >>
>> >>
>> >>
>> >> --
>> >> Harsh J
>> >>
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: copy chunk of hadoop output

Posted by jamal sasha <ja...@gmail.com>.

Though it copies.. but it gives this error?


On Fri, Mar 1, 2013 at 3:21 PM, jamal sasha <ja...@gmail.com> wrote:

> When I try this.. I get an error
> cat: Unable to write to output stream.
>
> Are these permissions issue
> How do i resolve this?
> THanks
>
>
> On Wed, Feb 20, 2013 at 12:21 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> No problem JM, I was confused as well.
>>
>> AFAIK, there's no shell utility that can let you specify an offset #
>> of bytes to start off with (similar to skip in dd?), but that can be
>> done from the FS API.
>>
>> On Thu, Feb 21, 2013 at 1:14 AM, Jean-Marc Spaggiari
>> <je...@spaggiari.org> wrote:
>> > Hi Harsh,
>> >
>> > My bad.
>> >
>> > I read the example quickly and I don't know why I tought you used tail
>> > and not head.
>> >
>> > head will work perfectly. But tail will not since it will need to read
>> > the entier file. My comment was for tail, not for head, and therefore
>> > not application to the example you gave.
>> >
>> >
>> > hadoop fs -cat 100-byte-dfs-file | tail -c 5 > 5-byte-local-file
>> >
>> > Will have to download the entire file.
>> >
>> > Is there a way to "jump" into a certain position in a file and "cat"
>> from there?
>> >
>> > JM
>> >
>> > 2013/2/20, Harsh J <ha...@cloudera.com>:
>> >> Hi JM,
>> >>
>> >> I am not sure how "dangerous" it is, since we're using a pipe here,
>> >> and as you yourself note, it will only last as long as the last bytes
>> >> have been got and then terminate.
>> >>
>> >> The -cat process will terminate because the
>> >> process we're piping to will terminate first after it reaches its goal
>> >> of -c <N bytes>; so certainly the "-cat" program will not fetch the
>> >> whole file down but it may fetch a few bytes extra over communication
>> >> due to use of read buffers (the extra data won't be put into the target
>> >> file, and get discarded).
>> >>
>> >> We can try it out and observe the "clienttrace" logged
>> >> at the DN at the end of the -cat's read. Here's an example:
>> >>
>> >> I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes"
>> >> below, its ~1.58 MB:
>> >>
>> >> 2013-02-20 23:55:19,777 INFO
>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> >> /127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op:
>> >> HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,
>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>> >> duration: 192289000
>> >>
>> >> I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to
>> >> store first 5 bytes onto a local file:
>> >>
>> >> Asserting that post command we get 5 bytes:
>> >> ➜  ~ wc -c foo.xml
>> >>        5 foo.xml
>> >>
>> >> Asserting that DN didn't IO-read the whole file, see the read op below
>> >> and its "bytes" parameter, its only about 193 KB, not the whole block
>> >> of 1.58 MB we wrote earlier:
>> >>
>> >> 2013-02-21 00:01:32,437 INFO
>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> >> /127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op:
>> >> HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,
>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>> >> duration: 19207000
>> >>
>> >> I don't see how this is anymore dangerous than doing a
>> >> -copyToLocal/-get, which retrieves the whole file anyway?
>> >>
>> >> On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
>> >> <je...@spaggiari.org> wrote:
>> >>> But be careful.
>> >>>
>> >>> hadoop fs -cat will retrieve the entire file and last only when it
>> >>> will have retrieve the last bytes you are looking for.
>> >>>
>> >>> If your file is many GB big, it will take a lot of time for this
>> >>> command to complete and will put some pressure on your network.
>> >>>
>> >>> JM
>> >>>
>> >>> 2013/2/19, jamal sasha <ja...@gmail.com>:
>> >>>> Awesome thanks :)
>> >>>>
>> >>>>
>> >>>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com> wrote:
>> >>>>
>> >>>>> You can instead use 'fs -cat' and the 'head' coreutil, as one
>> example:
>> >>>>>
>> >>>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>> >>>>>
>> >>>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <jamalshasha@gmail.com
>> >
>> >>>>> wrote:
>> >>>>> > Hi,
>> >>>>> >   I was wondering in the following command:
>> >>>>> >
>> >>>>> > bin/hadoop dfs -copyToLocal hdfspath localpath
>> >>>>> > can we have specify to copy not full but like xMB's of file to
>> local
>> >>>>> drive?
>> >>>>> >
>> >>>>> > Is something like this possible
>> >>>>> > Thanks
>> >>>>> > Jamal
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Harsh J
>> >>>>>
>> >>>>
>> >>
>> >>
>> >>
>> >> --
>> >> Harsh J
>> >>
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: copy chunk of hadoop output

Posted by jamal sasha <ja...@gmail.com>.

Though it copies.. but it gives this error?


On Fri, Mar 1, 2013 at 3:21 PM, jamal sasha <ja...@gmail.com> wrote:

> When I try this.. I get an error
> cat: Unable to write to output stream.
>
> Are these permissions issue
> How do i resolve this?
> THanks
>
>
> On Wed, Feb 20, 2013 at 12:21 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> No problem JM, I was confused as well.
>>
>> AFAIK, there's no shell utility that can let you specify an offset #
>> of bytes to start off with (similar to skip in dd?), but that can be
>> done from the FS API.
>>
>> On Thu, Feb 21, 2013 at 1:14 AM, Jean-Marc Spaggiari
>> <je...@spaggiari.org> wrote:
>> > Hi Harsh,
>> >
>> > My bad.
>> >
>> > I read the example quickly and I don't know why I tought you used tail
>> > and not head.
>> >
>> > head will work perfectly. But tail will not since it will need to read
>> > the entier file. My comment was for tail, not for head, and therefore
>> > not application to the example you gave.
>> >
>> >
>> > hadoop fs -cat 100-byte-dfs-file | tail -c 5 > 5-byte-local-file
>> >
>> > Will have to download the entire file.
>> >
>> > Is there a way to "jump" into a certain position in a file and "cat"
>> from there?
>> >
>> > JM
>> >
>> > 2013/2/20, Harsh J <ha...@cloudera.com>:
>> >> Hi JM,
>> >>
>> >> I am not sure how "dangerous" it is, since we're using a pipe here,
>> >> and as you yourself note, it will only last as long as the last bytes
>> >> have been got and then terminate.
>> >>
>> >> The -cat process will terminate because the
>> >> process we're piping to will terminate first after it reaches its goal
>> >> of -c <N bytes>; so certainly the "-cat" program will not fetch the
>> >> whole file down but it may fetch a few bytes extra over communication
>> >> due to use of read buffers (the extra data won't be put into the target
>> >> file, and get discarded).
>> >>
>> >> We can try it out and observe the "clienttrace" logged
>> >> at the DN at the end of the -cat's read. Here's an example:
>> >>
>> >> I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes"
>> >> below, its ~1.58 MB:
>> >>
>> >> 2013-02-20 23:55:19,777 INFO
>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> >> /127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op:
>> >> HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,
>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>> >> duration: 192289000
>> >>
>> >> I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to
>> >> store first 5 bytes onto a local file:
>> >>
>> >> Asserting that post command we get 5 bytes:
>> >> ➜  ~ wc -c foo.xml
>> >>        5 foo.xml
>> >>
>> >> Asserting that DN didn't IO-read the whole file, see the read op below
>> >> and its "bytes" parameter, its only about 193 KB, not the whole block
>> >> of 1.58 MB we wrote earlier:
>> >>
>> >> 2013-02-21 00:01:32,437 INFO
>> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> >> /127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op:
>> >> HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,
>> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>> >> duration: 19207000
>> >>
>> >> I don't see how this is anymore dangerous than doing a
>> >> -copyToLocal/-get, which retrieves the whole file anyway?
>> >>
>> >> On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
>> >> <je...@spaggiari.org> wrote:
>> >>> But be careful.
>> >>>
>> >>> hadoop fs -cat will retrieve the entire file and last only when it
>> >>> will have retrieve the last bytes you are looking for.
>> >>>
>> >>> If your file is many GB big, it will take a lot of time for this
>> >>> command to complete and will put some pressure on your network.
>> >>>
>> >>> JM
>> >>>
>> >>> 2013/2/19, jamal sasha <ja...@gmail.com>:
>> >>>> Awesome thanks :)
>> >>>>
>> >>>>
>> >>>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com> wrote:
>> >>>>
>> >>>>> You can instead use 'fs -cat' and the 'head' coreutil, as one
>> example:
>> >>>>>
>> >>>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>> >>>>>
>> >>>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <jamalshasha@gmail.com
>> >
>> >>>>> wrote:
>> >>>>> > Hi,
>> >>>>> >   I was wondering in the following command:
>> >>>>> >
>> >>>>> > bin/hadoop dfs -copyToLocal hdfspath localpath
>> >>>>> > can we have specify to copy not full but like xMB's of file to
>> local
>> >>>>> drive?
>> >>>>> >
>> >>>>> > Is something like this possible
>> >>>>> > Thanks
>> >>>>> > Jamal
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Harsh J
>> >>>>>
>> >>>>
>> >>
>> >>
>> >>
>> >> --
>> >> Harsh J
>> >>
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: copy chunk of hadoop output

Posted by jamal sasha <ja...@gmail.com>.

When I try this.. I get an error
cat: Unable to write to output stream.

Are these permissions issue
How do i resolve this?
THanks


On Wed, Feb 20, 2013 at 12:21 PM, Harsh J <ha...@cloudera.com> wrote:

> No problem JM, I was confused as well.
>
> AFAIK, there's no shell utility that can let you specify an offset #
> of bytes to start off with (similar to skip in dd?), but that can be
> done from the FS API.
>
> On Thu, Feb 21, 2013 at 1:14 AM, Jean-Marc Spaggiari
> <je...@spaggiari.org> wrote:
> > Hi Harsh,
> >
> > My bad.
> >
> > I read the example quickly and I don't know why I tought you used tail
> > and not head.
> >
> > head will work perfectly. But tail will not since it will need to read
> > the entier file. My comment was for tail, not for head, and therefore
> > not application to the example you gave.
> >
> >
> > hadoop fs -cat 100-byte-dfs-file | tail -c 5 > 5-byte-local-file
> >
> > Will have to download the entire file.
> >
> > Is there a way to "jump" into a certain position in a file and "cat"
> from there?
> >
> > JM
> >
> > 2013/2/20, Harsh J <ha...@cloudera.com>:
> >> Hi JM,
> >>
> >> I am not sure how "dangerous" it is, since we're using a pipe here,
> >> and as you yourself note, it will only last as long as the last bytes
> >> have been got and then terminate.
> >>
> >> The -cat process will terminate because the
> >> process we're piping to will terminate first after it reaches its goal
> >> of -c <N bytes>; so certainly the "-cat" program will not fetch the
> >> whole file down but it may fetch a few bytes extra over communication
> >> due to use of read buffers (the extra data won't be put into the target
> >> file, and get discarded).
> >>
> >> We can try it out and observe the "clienttrace" logged
> >> at the DN at the end of the -cat's read. Here's an example:
> >>
> >> I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes"
> >> below, its ~1.58 MB:
> >>
> >> 2013-02-20 23:55:19,777 INFO
> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> >> /127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op:
> >> HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,
> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
> >> duration: 192289000
> >>
> >> I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to
> >> store first 5 bytes onto a local file:
> >>
> >> Asserting that post command we get 5 bytes:
> >> ➜  ~ wc -c foo.xml
> >>        5 foo.xml
> >>
> >> Asserting that DN didn't IO-read the whole file, see the read op below
> >> and its "bytes" parameter, its only about 193 KB, not the whole block
> >> of 1.58 MB we wrote earlier:
> >>
> >> 2013-02-21 00:01:32,437 INFO
> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> >> /127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op:
> >> HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,
> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
> >> duration: 19207000
> >>
> >> I don't see how this is anymore dangerous than doing a
> >> -copyToLocal/-get, which retrieves the whole file anyway?
> >>
> >> On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
> >> <je...@spaggiari.org> wrote:
> >>> But be careful.
> >>>
> >>> hadoop fs -cat will retrieve the entire file and last only when it
> >>> will have retrieve the last bytes you are looking for.
> >>>
> >>> If your file is many GB big, it will take a lot of time for this
> >>> command to complete and will put some pressure on your network.
> >>>
> >>> JM
> >>>
> >>> 2013/2/19, jamal sasha <ja...@gmail.com>:
> >>>> Awesome thanks :)
> >>>>
> >>>>
> >>>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com> wrote:
> >>>>
> >>>>> You can instead use 'fs -cat' and the 'head' coreutil, as one
> example:
> >>>>>
> >>>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
> >>>>>
> >>>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <ja...@gmail.com>
> >>>>> wrote:
> >>>>> > Hi,
> >>>>> >   I was wondering in the following command:
> >>>>> >
> >>>>> > bin/hadoop dfs -copyToLocal hdfspath localpath
> >>>>> > can we have specify to copy not full but like xMB's of file to
> local
> >>>>> drive?
> >>>>> >
> >>>>> > Is something like this possible
> >>>>> > Thanks
> >>>>> > Jamal
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Harsh J
> >>>>>
> >>>>
> >>
> >>
> >>
> >> --
> >> Harsh J
> >>
>
>
>
> --
> Harsh J
>

Re: copy chunk of hadoop output

Posted by jamal sasha <ja...@gmail.com>.

When I try this.. I get an error
cat: Unable to write to output stream.

Are these permissions issue
How do i resolve this?
THanks


On Wed, Feb 20, 2013 at 12:21 PM, Harsh J <ha...@cloudera.com> wrote:

> No problem JM, I was confused as well.
>
> AFAIK, there's no shell utility that can let you specify an offset #
> of bytes to start off with (similar to skip in dd?), but that can be
> done from the FS API.
>
> On Thu, Feb 21, 2013 at 1:14 AM, Jean-Marc Spaggiari
> <je...@spaggiari.org> wrote:
> > Hi Harsh,
> >
> > My bad.
> >
> > I read the example quickly and I don't know why I tought you used tail
> > and not head.
> >
> > head will work perfectly. But tail will not since it will need to read
> > the entier file. My comment was for tail, not for head, and therefore
> > not application to the example you gave.
> >
> >
> > hadoop fs -cat 100-byte-dfs-file | tail -c 5 > 5-byte-local-file
> >
> > Will have to download the entire file.
> >
> > Is there a way to "jump" into a certain position in a file and "cat"
> from there?
> >
> > JM
> >
> > 2013/2/20, Harsh J <ha...@cloudera.com>:
> >> Hi JM,
> >>
> >> I am not sure how "dangerous" it is, since we're using a pipe here,
> >> and as you yourself note, it will only last as long as the last bytes
> >> have been got and then terminate.
> >>
> >> The -cat process will terminate because the
> >> process we're piping to will terminate first after it reaches its goal
> >> of -c <N bytes>; so certainly the "-cat" program will not fetch the
> >> whole file down but it may fetch a few bytes extra over communication
> >> due to use of read buffers (the extra data won't be put into the target
> >> file, and get discarded).
> >>
> >> We can try it out and observe the "clienttrace" logged
> >> at the DN at the end of the -cat's read. Here's an example:
> >>
> >> I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes"
> >> below, its ~1.58 MB:
> >>
> >> 2013-02-20 23:55:19,777 INFO
> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> >> /127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op:
> >> HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,
> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
> >> duration: 192289000
> >>
> >> I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to
> >> store first 5 bytes onto a local file:
> >>
> >> Asserting that post command we get 5 bytes:
> >> ➜  ~ wc -c foo.xml
> >>        5 foo.xml
> >>
> >> Asserting that DN didn't IO-read the whole file, see the read op below
> >> and its "bytes" parameter, its only about 193 KB, not the whole block
> >> of 1.58 MB we wrote earlier:
> >>
> >> 2013-02-21 00:01:32,437 INFO
> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> >> /127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op:
> >> HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,
> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
> >> duration: 19207000
> >>
> >> I don't see how this is anymore dangerous than doing a
> >> -copyToLocal/-get, which retrieves the whole file anyway?
> >>
> >> On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
> >> <je...@spaggiari.org> wrote:
> >>> But be careful.
> >>>
> >>> hadoop fs -cat will retrieve the entire file and last only when it
> >>> will have retrieve the last bytes you are looking for.
> >>>
> >>> If your file is many GB big, it will take a lot of time for this
> >>> command to complete and will put some pressure on your network.
> >>>
> >>> JM
> >>>
> >>> 2013/2/19, jamal sasha <ja...@gmail.com>:
> >>>> Awesome thanks :)
> >>>>
> >>>>
> >>>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com> wrote:
> >>>>
> >>>>> You can instead use 'fs -cat' and the 'head' coreutil, as one
> example:
> >>>>>
> >>>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
> >>>>>
> >>>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <ja...@gmail.com>
> >>>>> wrote:
> >>>>> > Hi,
> >>>>> >   I was wondering in the following command:
> >>>>> >
> >>>>> > bin/hadoop dfs -copyToLocal hdfspath localpath
> >>>>> > can we have specify to copy not full but like xMB's of file to
> local
> >>>>> drive?
> >>>>> >
> >>>>> > Is something like this possible
> >>>>> > Thanks
> >>>>> > Jamal
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Harsh J
> >>>>>
> >>>>
> >>
> >>
> >>
> >> --
> >> Harsh J
> >>
>
>
>
> --
> Harsh J
>

Re: copy chunk of hadoop output

Posted by jamal sasha <ja...@gmail.com>.

When I try this.. I get an error
cat: Unable to write to output stream.

Are these permissions issue
How do i resolve this?
THanks


On Wed, Feb 20, 2013 at 12:21 PM, Harsh J <ha...@cloudera.com> wrote:

> No problem JM, I was confused as well.
>
> AFAIK, there's no shell utility that can let you specify an offset #
> of bytes to start off with (similar to skip in dd?), but that can be
> done from the FS API.
>
> On Thu, Feb 21, 2013 at 1:14 AM, Jean-Marc Spaggiari
> <je...@spaggiari.org> wrote:
> > Hi Harsh,
> >
> > My bad.
> >
> > I read the example quickly and I don't know why I tought you used tail
> > and not head.
> >
> > head will work perfectly. But tail will not since it will need to read
> > the entier file. My comment was for tail, not for head, and therefore
> > not application to the example you gave.
> >
> >
> > hadoop fs -cat 100-byte-dfs-file | tail -c 5 > 5-byte-local-file
> >
> > Will have to download the entire file.
> >
> > Is there a way to "jump" into a certain position in a file and "cat"
> from there?
> >
> > JM
> >
> > 2013/2/20, Harsh J <ha...@cloudera.com>:
> >> Hi JM,
> >>
> >> I am not sure how "dangerous" it is, since we're using a pipe here,
> >> and as you yourself note, it will only last as long as the last bytes
> >> have been got and then terminate.
> >>
> >> The -cat process will terminate because the
> >> process we're piping to will terminate first after it reaches its goal
> >> of -c <N bytes>; so certainly the "-cat" program will not fetch the
> >> whole file down but it may fetch a few bytes extra over communication
> >> due to use of read buffers (the extra data won't be put into the target
> >> file, and get discarded).
> >>
> >> We can try it out and observe the "clienttrace" logged
> >> at the DN at the end of the -cat's read. Here's an example:
> >>
> >> I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes"
> >> below, its ~1.58 MB:
> >>
> >> 2013-02-20 23:55:19,777 INFO
> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> >> /127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op:
> >> HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,
> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
> >> duration: 192289000
> >>
> >> I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to
> >> store first 5 bytes onto a local file:
> >>
> >> Asserting that post command we get 5 bytes:
> >> ➜  ~ wc -c foo.xml
> >>        5 foo.xml
> >>
> >> Asserting that DN didn't IO-read the whole file, see the read op below
> >> and its "bytes" parameter, its only about 193 KB, not the whole block
> >> of 1.58 MB we wrote earlier:
> >>
> >> 2013-02-21 00:01:32,437 INFO
> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> >> /127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op:
> >> HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,
> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
> >> duration: 19207000
> >>
> >> I don't see how this is anymore dangerous than doing a
> >> -copyToLocal/-get, which retrieves the whole file anyway?
> >>
> >> On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
> >> <je...@spaggiari.org> wrote:
> >>> But be careful.
> >>>
> >>> hadoop fs -cat will retrieve the entire file and last only when it
> >>> will have retrieve the last bytes you are looking for.
> >>>
> >>> If your file is many GB big, it will take a lot of time for this
> >>> command to complete and will put some pressure on your network.
> >>>
> >>> JM
> >>>
> >>> 2013/2/19, jamal sasha <ja...@gmail.com>:
> >>>> Awesome thanks :)
> >>>>
> >>>>
> >>>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com> wrote:
> >>>>
> >>>>> You can instead use 'fs -cat' and the 'head' coreutil, as one
> example:
> >>>>>
> >>>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
> >>>>>
> >>>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <ja...@gmail.com>
> >>>>> wrote:
> >>>>> > Hi,
> >>>>> >   I was wondering in the following command:
> >>>>> >
> >>>>> > bin/hadoop dfs -copyToLocal hdfspath localpath
> >>>>> > can we have specify to copy not full but like xMB's of file to
> local
> >>>>> drive?
> >>>>> >
> >>>>> > Is something like this possible
> >>>>> > Thanks
> >>>>> > Jamal
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Harsh J
> >>>>>
> >>>>
> >>
> >>
> >>
> >> --
> >> Harsh J
> >>
>
>
>
> --
> Harsh J
>

Re: copy chunk of hadoop output

Posted by jamal sasha <ja...@gmail.com>.

When I try this.. I get an error
cat: Unable to write to output stream.

Are these permissions issue
How do i resolve this?
THanks


On Wed, Feb 20, 2013 at 12:21 PM, Harsh J <ha...@cloudera.com> wrote:

> No problem JM, I was confused as well.
>
> AFAIK, there's no shell utility that can let you specify an offset #
> of bytes to start off with (similar to skip in dd?), but that can be
> done from the FS API.
>
> On Thu, Feb 21, 2013 at 1:14 AM, Jean-Marc Spaggiari
> <je...@spaggiari.org> wrote:
> > Hi Harsh,
> >
> > My bad.
> >
> > I read the example quickly and I don't know why I tought you used tail
> > and not head.
> >
> > head will work perfectly. But tail will not since it will need to read
> > the entier file. My comment was for tail, not for head, and therefore
> > not application to the example you gave.
> >
> >
> > hadoop fs -cat 100-byte-dfs-file | tail -c 5 > 5-byte-local-file
> >
> > Will have to download the entire file.
> >
> > Is there a way to "jump" into a certain position in a file and "cat"
> from there?
> >
> > JM
> >
> > 2013/2/20, Harsh J <ha...@cloudera.com>:
> >> Hi JM,
> >>
> >> I am not sure how "dangerous" it is, since we're using a pipe here,
> >> and as you yourself note, it will only last as long as the last bytes
> >> have been got and then terminate.
> >>
> >> The -cat process will terminate because the
> >> process we're piping to will terminate first after it reaches its goal
> >> of -c <N bytes>; so certainly the "-cat" program will not fetch the
> >> whole file down but it may fetch a few bytes extra over communication
> >> due to use of read buffers (the extra data won't be put into the target
> >> file, and get discarded).
> >>
> >> We can try it out and observe the "clienttrace" logged
> >> at the DN at the end of the -cat's read. Here's an example:
> >>
> >> I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes"
> >> below, its ~1.58 MB:
> >>
> >> 2013-02-20 23:55:19,777 INFO
> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> >> /127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op:
> >> HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,
> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
> >> duration: 192289000
> >>
> >> I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to
> >> store first 5 bytes onto a local file:
> >>
> >> Asserting that post command we get 5 bytes:
> >> ➜  ~ wc -c foo.xml
> >>        5 foo.xml
> >>
> >> Asserting that DN didn't IO-read the whole file, see the read op below
> >> and its "bytes" parameter, its only about 193 KB, not the whole block
> >> of 1.58 MB we wrote earlier:
> >>
> >> 2013-02-21 00:01:32,437 INFO
> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> >> /127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op:
> >> HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,
> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
> >> duration: 19207000
> >>
> >> I don't see how this is anymore dangerous than doing a
> >> -copyToLocal/-get, which retrieves the whole file anyway?
> >>
> >> On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
> >> <je...@spaggiari.org> wrote:
> >>> But be careful.
> >>>
> >>> hadoop fs -cat will retrieve the entire file and last only when it
> >>> will have retrieve the last bytes you are looking for.
> >>>
> >>> If your file is many GB big, it will take a lot of time for this
> >>> command to complete and will put some pressure on your network.
> >>>
> >>> JM
> >>>
> >>> 2013/2/19, jamal sasha <ja...@gmail.com>:
> >>>> Awesome thanks :)
> >>>>
> >>>>
> >>>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com> wrote:
> >>>>
> >>>>> You can instead use 'fs -cat' and the 'head' coreutil, as one
> example:
> >>>>>
> >>>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
> >>>>>
> >>>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <ja...@gmail.com>
> >>>>> wrote:
> >>>>> > Hi,
> >>>>> >   I was wondering in the following command:
> >>>>> >
> >>>>> > bin/hadoop dfs -copyToLocal hdfspath localpath
> >>>>> > can we have specify to copy not full but like xMB's of file to
> local
> >>>>> drive?
> >>>>> >
> >>>>> > Is something like this possible
> >>>>> > Thanks
> >>>>> > Jamal
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Harsh J
> >>>>>
> >>>>
> >>
> >>
> >>
> >> --
> >> Harsh J
> >>
>
>
>
> --
> Harsh J
>

Re: copy chunk of hadoop output

Posted by Harsh J <ha...@cloudera.com>.

No problem JM, I was confused as well.

AFAIK, there's no shell utility that can let you specify an offset #
of bytes to start off with (similar to skip in dd?), but that can be
done from the FS API.

On Thu, Feb 21, 2013 at 1:14 AM, Jean-Marc Spaggiari
<je...@spaggiari.org> wrote:
> Hi Harsh,
>
> My bad.
>
> I read the example quickly and I don't know why I tought you used tail
> and not head.
>
> head will work perfectly. But tail will not since it will need to read
> the entier file. My comment was for tail, not for head, and therefore
> not application to the example you gave.
>
>
> hadoop fs -cat 100-byte-dfs-file | tail -c 5 > 5-byte-local-file
>
> Will have to download the entire file.
>
> Is there a way to "jump" into a certain position in a file and "cat" from there?
>
> JM
>
> 2013/2/20, Harsh J <ha...@cloudera.com>:
>> Hi JM,
>>
>> I am not sure how "dangerous" it is, since we're using a pipe here,
>> and as you yourself note, it will only last as long as the last bytes
>> have been got and then terminate.
>>
>> The -cat process will terminate because the
>> process we're piping to will terminate first after it reaches its goal
>> of -c <N bytes>; so certainly the "-cat" program will not fetch the
>> whole file down but it may fetch a few bytes extra over communication
>> due to use of read buffers (the extra data won't be put into the target
>> file, and get discarded).
>>
>> We can try it out and observe the "clienttrace" logged
>> at the DN at the end of the -cat's read. Here's an example:
>>
>> I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes"
>> below, its ~1.58 MB:
>>
>> 2013-02-20 23:55:19,777 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> /127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op:
>> HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,
>> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>> duration: 192289000
>>
>> I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to
>> store first 5 bytes onto a local file:
>>
>> Asserting that post command we get 5 bytes:
>> ➜  ~ wc -c foo.xml
>>        5 foo.xml
>>
>> Asserting that DN didn't IO-read the whole file, see the read op below
>> and its "bytes" parameter, its only about 193 KB, not the whole block
>> of 1.58 MB we wrote earlier:
>>
>> 2013-02-21 00:01:32,437 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> /127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op:
>> HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,
>> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>> duration: 19207000
>>
>> I don't see how this is anymore dangerous than doing a
>> -copyToLocal/-get, which retrieves the whole file anyway?
>>
>> On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
>> <je...@spaggiari.org> wrote:
>>> But be careful.
>>>
>>> hadoop fs -cat will retrieve the entire file and last only when it
>>> will have retrieve the last bytes you are looking for.
>>>
>>> If your file is many GB big, it will take a lot of time for this
>>> command to complete and will put some pressure on your network.
>>>
>>> JM
>>>
>>> 2013/2/19, jamal sasha <ja...@gmail.com>:
>>>> Awesome thanks :)
>>>>
>>>>
>>>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>
>>>>> You can instead use 'fs -cat' and the 'head' coreutil, as one example:
>>>>>
>>>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>>>>>
>>>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <ja...@gmail.com>
>>>>> wrote:
>>>>> > Hi,
>>>>> >   I was wondering in the following command:
>>>>> >
>>>>> > bin/hadoop dfs -copyToLocal hdfspath localpath
>>>>> > can we have specify to copy not full but like xMB's of file to local
>>>>> drive?
>>>>> >
>>>>> > Is something like this possible
>>>>> > Thanks
>>>>> > Jamal
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Harsh J
>>>>>
>>>>
>>
>>
>>
>> --
>> Harsh J
>>



--
Harsh J

Re: copy chunk of hadoop output

Posted by Harsh J <ha...@cloudera.com>.

No problem JM, I was confused as well.

AFAIK, there's no shell utility that can let you specify an offset #
of bytes to start off with (similar to skip in dd?), but that can be
done from the FS API.

On Thu, Feb 21, 2013 at 1:14 AM, Jean-Marc Spaggiari
<je...@spaggiari.org> wrote:
> Hi Harsh,
>
> My bad.
>
> I read the example quickly and I don't know why I tought you used tail
> and not head.
>
> head will work perfectly. But tail will not since it will need to read
> the entier file. My comment was for tail, not for head, and therefore
> not application to the example you gave.
>
>
> hadoop fs -cat 100-byte-dfs-file | tail -c 5 > 5-byte-local-file
>
> Will have to download the entire file.
>
> Is there a way to "jump" into a certain position in a file and "cat" from there?
>
> JM
>
> 2013/2/20, Harsh J <ha...@cloudera.com>:
>> Hi JM,
>>
>> I am not sure how "dangerous" it is, since we're using a pipe here,
>> and as you yourself note, it will only last as long as the last bytes
>> have been got and then terminate.
>>
>> The -cat process will terminate because the
>> process we're piping to will terminate first after it reaches its goal
>> of -c <N bytes>; so certainly the "-cat" program will not fetch the
>> whole file down but it may fetch a few bytes extra over communication
>> due to use of read buffers (the extra data won't be put into the target
>> file, and get discarded).
>>
>> We can try it out and observe the "clienttrace" logged
>> at the DN at the end of the -cat's read. Here's an example:
>>
>> I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes"
>> below, its ~1.58 MB:
>>
>> 2013-02-20 23:55:19,777 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> /127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op:
>> HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,
>> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>> duration: 192289000
>>
>> I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to
>> store first 5 bytes onto a local file:
>>
>> Asserting that post command we get 5 bytes:
>> ➜  ~ wc -c foo.xml
>>        5 foo.xml
>>
>> Asserting that DN didn't IO-read the whole file, see the read op below
>> and its "bytes" parameter, its only about 193 KB, not the whole block
>> of 1.58 MB we wrote earlier:
>>
>> 2013-02-21 00:01:32,437 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> /127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op:
>> HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,
>> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>> duration: 19207000
>>
>> I don't see how this is anymore dangerous than doing a
>> -copyToLocal/-get, which retrieves the whole file anyway?
>>
>> On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
>> <je...@spaggiari.org> wrote:
>>> But be careful.
>>>
>>> hadoop fs -cat will retrieve the entire file and last only when it
>>> will have retrieve the last bytes you are looking for.
>>>
>>> If your file is many GB big, it will take a lot of time for this
>>> command to complete and will put some pressure on your network.
>>>
>>> JM
>>>
>>> 2013/2/19, jamal sasha <ja...@gmail.com>:
>>>> Awesome thanks :)
>>>>
>>>>
>>>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>
>>>>> You can instead use 'fs -cat' and the 'head' coreutil, as one example:
>>>>>
>>>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>>>>>
>>>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <ja...@gmail.com>
>>>>> wrote:
>>>>> > Hi,
>>>>> >   I was wondering in the following command:
>>>>> >
>>>>> > bin/hadoop dfs -copyToLocal hdfspath localpath
>>>>> > can we have specify to copy not full but like xMB's of file to local
>>>>> drive?
>>>>> >
>>>>> > Is something like this possible
>>>>> > Thanks
>>>>> > Jamal
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Harsh J
>>>>>
>>>>
>>
>>
>>
>> --
>> Harsh J
>>



--
Harsh J

Re: copy chunk of hadoop output

Posted by Harsh J <ha...@cloudera.com>.

No problem JM, I was confused as well.

AFAIK, there's no shell utility that can let you specify an offset #
of bytes to start off with (similar to skip in dd?), but that can be
done from the FS API.

On Thu, Feb 21, 2013 at 1:14 AM, Jean-Marc Spaggiari
<je...@spaggiari.org> wrote:
> Hi Harsh,
>
> My bad.
>
> I read the example quickly and I don't know why I tought you used tail
> and not head.
>
> head will work perfectly. But tail will not since it will need to read
> the entier file. My comment was for tail, not for head, and therefore
> not application to the example you gave.
>
>
> hadoop fs -cat 100-byte-dfs-file | tail -c 5 > 5-byte-local-file
>
> Will have to download the entire file.
>
> Is there a way to "jump" into a certain position in a file and "cat" from there?
>
> JM
>
> 2013/2/20, Harsh J <ha...@cloudera.com>:
>> Hi JM,
>>
>> I am not sure how "dangerous" it is, since we're using a pipe here,
>> and as you yourself note, it will only last as long as the last bytes
>> have been got and then terminate.
>>
>> The -cat process will terminate because the
>> process we're piping to will terminate first after it reaches its goal
>> of -c <N bytes>; so certainly the "-cat" program will not fetch the
>> whole file down but it may fetch a few bytes extra over communication
>> due to use of read buffers (the extra data won't be put into the target
>> file, and get discarded).
>>
>> We can try it out and observe the "clienttrace" logged
>> at the DN at the end of the -cat's read. Here's an example:
>>
>> I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes"
>> below, its ~1.58 MB:
>>
>> 2013-02-20 23:55:19,777 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> /127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op:
>> HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,
>> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>> duration: 192289000
>>
>> I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to
>> store first 5 bytes onto a local file:
>>
>> Asserting that post command we get 5 bytes:
>> ➜  ~ wc -c foo.xml
>>        5 foo.xml
>>
>> Asserting that DN didn't IO-read the whole file, see the read op below
>> and its "bytes" parameter, its only about 193 KB, not the whole block
>> of 1.58 MB we wrote earlier:
>>
>> 2013-02-21 00:01:32,437 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> /127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op:
>> HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,
>> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>> duration: 19207000
>>
>> I don't see how this is anymore dangerous than doing a
>> -copyToLocal/-get, which retrieves the whole file anyway?
>>
>> On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
>> <je...@spaggiari.org> wrote:
>>> But be careful.
>>>
>>> hadoop fs -cat will retrieve the entire file and last only when it
>>> will have retrieve the last bytes you are looking for.
>>>
>>> If your file is many GB big, it will take a lot of time for this
>>> command to complete and will put some pressure on your network.
>>>
>>> JM
>>>
>>> 2013/2/19, jamal sasha <ja...@gmail.com>:
>>>> Awesome thanks :)
>>>>
>>>>
>>>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>
>>>>> You can instead use 'fs -cat' and the 'head' coreutil, as one example:
>>>>>
>>>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>>>>>
>>>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <ja...@gmail.com>
>>>>> wrote:
>>>>> > Hi,
>>>>> >   I was wondering in the following command:
>>>>> >
>>>>> > bin/hadoop dfs -copyToLocal hdfspath localpath
>>>>> > can we have specify to copy not full but like xMB's of file to local
>>>>> drive?
>>>>> >
>>>>> > Is something like this possible
>>>>> > Thanks
>>>>> > Jamal
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Harsh J
>>>>>
>>>>
>>
>>
>>
>> --
>> Harsh J
>>



--
Harsh J

Re: copy chunk of hadoop output

Posted by Harsh J <ha...@cloudera.com>.

No problem JM, I was confused as well.

AFAIK, there's no shell utility that can let you specify an offset #
of bytes to start off with (similar to skip in dd?), but that can be
done from the FS API.

On Thu, Feb 21, 2013 at 1:14 AM, Jean-Marc Spaggiari
<je...@spaggiari.org> wrote:
> Hi Harsh,
>
> My bad.
>
> I read the example quickly and I don't know why I tought you used tail
> and not head.
>
> head will work perfectly. But tail will not since it will need to read
> the entier file. My comment was for tail, not for head, and therefore
> not application to the example you gave.
>
>
> hadoop fs -cat 100-byte-dfs-file | tail -c 5 > 5-byte-local-file
>
> Will have to download the entire file.
>
> Is there a way to "jump" into a certain position in a file and "cat" from there?
>
> JM
>
> 2013/2/20, Harsh J <ha...@cloudera.com>:
>> Hi JM,
>>
>> I am not sure how "dangerous" it is, since we're using a pipe here,
>> and as you yourself note, it will only last as long as the last bytes
>> have been got and then terminate.
>>
>> The -cat process will terminate because the
>> process we're piping to will terminate first after it reaches its goal
>> of -c <N bytes>; so certainly the "-cat" program will not fetch the
>> whole file down but it may fetch a few bytes extra over communication
>> due to use of read buffers (the extra data won't be put into the target
>> file, and get discarded).
>>
>> We can try it out and observe the "clienttrace" logged
>> at the DN at the end of the -cat's read. Here's an example:
>>
>> I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes"
>> below, its ~1.58 MB:
>>
>> 2013-02-20 23:55:19,777 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> /127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op:
>> HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,
>> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>> duration: 192289000
>>
>> I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to
>> store first 5 bytes onto a local file:
>>
>> Asserting that post command we get 5 bytes:
>> ➜  ~ wc -c foo.xml
>>        5 foo.xml
>>
>> Asserting that DN didn't IO-read the whole file, see the read op below
>> and its "bytes" parameter, its only about 193 KB, not the whole block
>> of 1.58 MB we wrote earlier:
>>
>> 2013-02-21 00:01:32,437 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> /127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op:
>> HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,
>> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
>> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
>> duration: 19207000
>>
>> I don't see how this is anymore dangerous than doing a
>> -copyToLocal/-get, which retrieves the whole file anyway?
>>
>> On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
>> <je...@spaggiari.org> wrote:
>>> But be careful.
>>>
>>> hadoop fs -cat will retrieve the entire file and last only when it
>>> will have retrieve the last bytes you are looking for.
>>>
>>> If your file is many GB big, it will take a lot of time for this
>>> command to complete and will put some pressure on your network.
>>>
>>> JM
>>>
>>> 2013/2/19, jamal sasha <ja...@gmail.com>:
>>>> Awesome thanks :)
>>>>
>>>>
>>>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>
>>>>> You can instead use 'fs -cat' and the 'head' coreutil, as one example:
>>>>>
>>>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>>>>>
>>>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <ja...@gmail.com>
>>>>> wrote:
>>>>> > Hi,
>>>>> >   I was wondering in the following command:
>>>>> >
>>>>> > bin/hadoop dfs -copyToLocal hdfspath localpath
>>>>> > can we have specify to copy not full but like xMB's of file to local
>>>>> drive?
>>>>> >
>>>>> > Is something like this possible
>>>>> > Thanks
>>>>> > Jamal
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Harsh J
>>>>>
>>>>
>>
>>
>>
>> --
>> Harsh J
>>



--
Harsh J

Re: copy chunk of hadoop output

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi Harsh,

My bad.

I read the example quickly and I don't know why I tought you used tail
and not head.

head will work perfectly. But tail will not since it will need to read
the entier file. My comment was for tail, not for head, and therefore
not application to the example you gave.


hadoop fs -cat 100-byte-dfs-file | tail -c 5 > 5-byte-local-file

Will have to download the entire file.

Is there a way to "jump" into a certain position in a file and "cat" from there?

JM

2013/2/20, Harsh J <ha...@cloudera.com>:
> Hi JM,
>
> I am not sure how "dangerous" it is, since we're using a pipe here,
> and as you yourself note, it will only last as long as the last bytes
> have been got and then terminate.
>
> The -cat process will terminate because the
> process we're piping to will terminate first after it reaches its goal
> of -c <N bytes>; so certainly the "-cat" program will not fetch the
> whole file down but it may fetch a few bytes extra over communication
> due to use of read buffers (the extra data won't be put into the target
> file, and get discarded).
>
> We can try it out and observe the "clienttrace" logged
> at the DN at the end of the -cat's read. Here's an example:
>
> I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes"
> below, its ~1.58 MB:
>
> 2013-02-20 23:55:19,777 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> /127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op:
> HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,
> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
> duration: 192289000
>
> I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to
> store first 5 bytes onto a local file:
>
> Asserting that post command we get 5 bytes:
> ➜  ~ wc -c foo.xml
>        5 foo.xml
>
> Asserting that DN didn't IO-read the whole file, see the read op below
> and its "bytes" parameter, its only about 193 KB, not the whole block
> of 1.58 MB we wrote earlier:
>
> 2013-02-21 00:01:32,437 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> /127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op:
> HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,
> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
> duration: 19207000
>
> I don't see how this is anymore dangerous than doing a
> -copyToLocal/-get, which retrieves the whole file anyway?
>
> On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
> <je...@spaggiari.org> wrote:
>> But be careful.
>>
>> hadoop fs -cat will retrieve the entire file and last only when it
>> will have retrieve the last bytes you are looking for.
>>
>> If your file is many GB big, it will take a lot of time for this
>> command to complete and will put some pressure on your network.
>>
>> JM
>>
>> 2013/2/19, jamal sasha <ja...@gmail.com>:
>>> Awesome thanks :)
>>>
>>>
>>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com> wrote:
>>>
>>>> You can instead use 'fs -cat' and the 'head' coreutil, as one example:
>>>>
>>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>>>>
>>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <ja...@gmail.com>
>>>> wrote:
>>>> > Hi,
>>>> >   I was wondering in the following command:
>>>> >
>>>> > bin/hadoop dfs -copyToLocal hdfspath localpath
>>>> > can we have specify to copy not full but like xMB's of file to local
>>>> drive?
>>>> >
>>>> > Is something like this possible
>>>> > Thanks
>>>> > Jamal
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>>
>>>
>
>
>
> --
> Harsh J
>

Re: copy chunk of hadoop output

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi Harsh,

My bad.

I read the example quickly and I don't know why I tought you used tail
and not head.

head will work perfectly. But tail will not since it will need to read
the entier file. My comment was for tail, not for head, and therefore
not application to the example you gave.


hadoop fs -cat 100-byte-dfs-file | tail -c 5 > 5-byte-local-file

Will have to download the entire file.

Is there a way to "jump" into a certain position in a file and "cat" from there?

JM

2013/2/20, Harsh J <ha...@cloudera.com>:
> Hi JM,
>
> I am not sure how "dangerous" it is, since we're using a pipe here,
> and as you yourself note, it will only last as long as the last bytes
> have been got and then terminate.
>
> The -cat process will terminate because the
> process we're piping to will terminate first after it reaches its goal
> of -c <N bytes>; so certainly the "-cat" program will not fetch the
> whole file down but it may fetch a few bytes extra over communication
> due to use of read buffers (the extra data won't be put into the target
> file, and get discarded).
>
> We can try it out and observe the "clienttrace" logged
> at the DN at the end of the -cat's read. Here's an example:
>
> I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes"
> below, its ~1.58 MB:
>
> 2013-02-20 23:55:19,777 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> /127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op:
> HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,
> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
> duration: 192289000
>
> I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to
> store first 5 bytes onto a local file:
>
> Asserting that post command we get 5 bytes:
> ➜  ~ wc -c foo.xml
>        5 foo.xml
>
> Asserting that DN didn't IO-read the whole file, see the read op below
> and its "bytes" parameter, its only about 193 KB, not the whole block
> of 1.58 MB we wrote earlier:
>
> 2013-02-21 00:01:32,437 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> /127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op:
> HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,
> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
> duration: 19207000
>
> I don't see how this is anymore dangerous than doing a
> -copyToLocal/-get, which retrieves the whole file anyway?
>
> On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
> <je...@spaggiari.org> wrote:
>> But be careful.
>>
>> hadoop fs -cat will retrieve the entire file and last only when it
>> will have retrieve the last bytes you are looking for.
>>
>> If your file is many GB big, it will take a lot of time for this
>> command to complete and will put some pressure on your network.
>>
>> JM
>>
>> 2013/2/19, jamal sasha <ja...@gmail.com>:
>>> Awesome thanks :)
>>>
>>>
>>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com> wrote:
>>>
>>>> You can instead use 'fs -cat' and the 'head' coreutil, as one example:
>>>>
>>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>>>>
>>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <ja...@gmail.com>
>>>> wrote:
>>>> > Hi,
>>>> >   I was wondering in the following command:
>>>> >
>>>> > bin/hadoop dfs -copyToLocal hdfspath localpath
>>>> > can we have specify to copy not full but like xMB's of file to local
>>>> drive?
>>>> >
>>>> > Is something like this possible
>>>> > Thanks
>>>> > Jamal
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>>
>>>
>
>
>
> --
> Harsh J
>

Re: copy chunk of hadoop output

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi Harsh,

My bad.

I read the example quickly and I don't know why I tought you used tail
and not head.

head will work perfectly. But tail will not since it will need to read
the entier file. My comment was for tail, not for head, and therefore
not application to the example you gave.


hadoop fs -cat 100-byte-dfs-file | tail -c 5 > 5-byte-local-file

Will have to download the entire file.

Is there a way to "jump" into a certain position in a file and "cat" from there?

JM

2013/2/20, Harsh J <ha...@cloudera.com>:
> Hi JM,
>
> I am not sure how "dangerous" it is, since we're using a pipe here,
> and as you yourself note, it will only last as long as the last bytes
> have been got and then terminate.
>
> The -cat process will terminate because the
> process we're piping to will terminate first after it reaches its goal
> of -c <N bytes>; so certainly the "-cat" program will not fetch the
> whole file down but it may fetch a few bytes extra over communication
> due to use of read buffers (the extra data won't be put into the target
> file, and get discarded).
>
> We can try it out and observe the "clienttrace" logged
> at the DN at the end of the -cat's read. Here's an example:
>
> I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes"
> below, its ~1.58 MB:
>
> 2013-02-20 23:55:19,777 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> /127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op:
> HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,
> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
> duration: 192289000
>
> I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to
> store first 5 bytes onto a local file:
>
> Asserting that post command we get 5 bytes:
> ➜  ~ wc -c foo.xml
>        5 foo.xml
>
> Asserting that DN didn't IO-read the whole file, see the read op below
> and its "bytes" parameter, its only about 193 KB, not the whole block
> of 1.58 MB we wrote earlier:
>
> 2013-02-21 00:01:32,437 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> /127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op:
> HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,
> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
> duration: 19207000
>
> I don't see how this is anymore dangerous than doing a
> -copyToLocal/-get, which retrieves the whole file anyway?
>
> On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
> <je...@spaggiari.org> wrote:
>> But be careful.
>>
>> hadoop fs -cat will retrieve the entire file and last only when it
>> will have retrieve the last bytes you are looking for.
>>
>> If your file is many GB big, it will take a lot of time for this
>> command to complete and will put some pressure on your network.
>>
>> JM
>>
>> 2013/2/19, jamal sasha <ja...@gmail.com>:
>>> Awesome thanks :)
>>>
>>>
>>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com> wrote:
>>>
>>>> You can instead use 'fs -cat' and the 'head' coreutil, as one example:
>>>>
>>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>>>>
>>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <ja...@gmail.com>
>>>> wrote:
>>>> > Hi,
>>>> >   I was wondering in the following command:
>>>> >
>>>> > bin/hadoop dfs -copyToLocal hdfspath localpath
>>>> > can we have specify to copy not full but like xMB's of file to local
>>>> drive?
>>>> >
>>>> > Is something like this possible
>>>> > Thanks
>>>> > Jamal
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>>
>>>
>
>
>
> --
> Harsh J
>

Re: copy chunk of hadoop output

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi Harsh,

My bad.

I read the example quickly and I don't know why I tought you used tail
and not head.

head will work perfectly. But tail will not since it will need to read
the entier file. My comment was for tail, not for head, and therefore
not application to the example you gave.


hadoop fs -cat 100-byte-dfs-file | tail -c 5 > 5-byte-local-file

Will have to download the entire file.

Is there a way to "jump" into a certain position in a file and "cat" from there?

JM

2013/2/20, Harsh J <ha...@cloudera.com>:
> Hi JM,
>
> I am not sure how "dangerous" it is, since we're using a pipe here,
> and as you yourself note, it will only last as long as the last bytes
> have been got and then terminate.
>
> The -cat process will terminate because the
> process we're piping to will terminate first after it reaches its goal
> of -c <N bytes>; so certainly the "-cat" program will not fetch the
> whole file down but it may fetch a few bytes extra over communication
> due to use of read buffers (the extra data won't be put into the target
> file, and get discarded).
>
> We can try it out and observe the "clienttrace" logged
> at the DN at the end of the -cat's read. Here's an example:
>
> I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes"
> below, its ~1.58 MB:
>
> 2013-02-20 23:55:19,777 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> /127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op:
> HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,
> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
> duration: 192289000
>
> I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to
> store first 5 bytes onto a local file:
>
> Asserting that post command we get 5 bytes:
> ➜  ~ wc -c foo.xml
>        5 foo.xml
>
> Asserting that DN didn't IO-read the whole file, see the read op below
> and its "bytes" parameter, its only about 193 KB, not the whole block
> of 1.58 MB we wrote earlier:
>
> 2013-02-21 00:01:32,437 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> /127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op:
> HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,
> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
> duration: 19207000
>
> I don't see how this is anymore dangerous than doing a
> -copyToLocal/-get, which retrieves the whole file anyway?
>
> On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
> <je...@spaggiari.org> wrote:
>> But be careful.
>>
>> hadoop fs -cat will retrieve the entire file and last only when it
>> will have retrieve the last bytes you are looking for.
>>
>> If your file is many GB big, it will take a lot of time for this
>> command to complete and will put some pressure on your network.
>>
>> JM
>>
>> 2013/2/19, jamal sasha <ja...@gmail.com>:
>>> Awesome thanks :)
>>>
>>>
>>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com> wrote:
>>>
>>>> You can instead use 'fs -cat' and the 'head' coreutil, as one example:
>>>>
>>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>>>>
>>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <ja...@gmail.com>
>>>> wrote:
>>>> > Hi,
>>>> >   I was wondering in the following command:
>>>> >
>>>> > bin/hadoop dfs -copyToLocal hdfspath localpath
>>>> > can we have specify to copy not full but like xMB's of file to local
>>>> drive?
>>>> >
>>>> > Is something like this possible
>>>> > Thanks
>>>> > Jamal
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>>
>>>
>
>
>
> --
> Harsh J
>

Re: copy chunk of hadoop output

Posted by Harsh J <ha...@cloudera.com>.

Hi JM,

I am not sure how "dangerous" it is, since we're using a pipe here,
and as you yourself note, it will only last as long as the last bytes
have been got and then terminate.

The -cat process will terminate because the
process we're piping to will terminate first after it reaches its goal
of -c <N bytes>; so certainly the "-cat" program will not fetch the
whole file down but it may fetch a few bytes extra over communication
due to use of read buffers (the extra data won't be put into the target
file, and get discarded).

We can try it out and observe the "clienttrace" logged
at the DN at the end of the -cat's read. Here's an example:

I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes"
below, its ~1.58 MB:

2013-02-20 23:55:19,777 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op:
HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,
srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
duration: 192289000

I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to
store first 5 bytes onto a local file:

Asserting that post command we get 5 bytes:
➜  ~ wc -c foo.xml
       5 foo.xml

Asserting that DN didn't IO-read the whole file, see the read op below
and its "bytes" parameter, its only about 193 KB, not the whole block
of 1.58 MB we wrote earlier:

2013-02-21 00:01:32,437 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op:
HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,
srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
duration: 19207000

I don't see how this is anymore dangerous than doing a
-copyToLocal/-get, which retrieves the whole file anyway?

On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
<je...@spaggiari.org> wrote:
> But be careful.
>
> hadoop fs -cat will retrieve the entire file and last only when it
> will have retrieve the last bytes you are looking for.
>
> If your file is many GB big, it will take a lot of time for this
> command to complete and will put some pressure on your network.
>
> JM
>
> 2013/2/19, jamal sasha <ja...@gmail.com>:
>> Awesome thanks :)
>>
>>
>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> You can instead use 'fs -cat' and the 'head' coreutil, as one example:
>>>
>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>>>
>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <ja...@gmail.com>
>>> wrote:
>>> > Hi,
>>> >   I was wondering in the following command:
>>> >
>>> > bin/hadoop dfs -copyToLocal hdfspath localpath
>>> > can we have specify to copy not full but like xMB's of file to local
>>> drive?
>>> >
>>> > Is something like this possible
>>> > Thanks
>>> > Jamal
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>



--
Harsh J

Re: copy chunk of hadoop output

Posted by Harsh J <ha...@cloudera.com>.

Hi JM,

I am not sure how "dangerous" it is, since we're using a pipe here,
and as you yourself note, it will only last as long as the last bytes
have been got and then terminate.

The -cat process will terminate because the
process we're piping to will terminate first after it reaches its goal
of -c <N bytes>; so certainly the "-cat" program will not fetch the
whole file down but it may fetch a few bytes extra over communication
due to use of read buffers (the extra data won't be put into the target
file, and get discarded).

We can try it out and observe the "clienttrace" logged
at the DN at the end of the -cat's read. Here's an example:

I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes"
below, its ~1.58 MB:

2013-02-20 23:55:19,777 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op:
HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,
srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
duration: 192289000

I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to
store first 5 bytes onto a local file:

Asserting that post command we get 5 bytes:
➜  ~ wc -c foo.xml
       5 foo.xml

Asserting that DN didn't IO-read the whole file, see the read op below
and its "bytes" parameter, its only about 193 KB, not the whole block
of 1.58 MB we wrote earlier:

2013-02-21 00:01:32,437 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op:
HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,
srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
duration: 19207000

I don't see how this is anymore dangerous than doing a
-copyToLocal/-get, which retrieves the whole file anyway?

On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
<je...@spaggiari.org> wrote:
> But be careful.
>
> hadoop fs -cat will retrieve the entire file and last only when it
> will have retrieve the last bytes you are looking for.
>
> If your file is many GB big, it will take a lot of time for this
> command to complete and will put some pressure on your network.
>
> JM
>
> 2013/2/19, jamal sasha <ja...@gmail.com>:
>> Awesome thanks :)
>>
>>
>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> You can instead use 'fs -cat' and the 'head' coreutil, as one example:
>>>
>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>>>
>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <ja...@gmail.com>
>>> wrote:
>>> > Hi,
>>> >   I was wondering in the following command:
>>> >
>>> > bin/hadoop dfs -copyToLocal hdfspath localpath
>>> > can we have specify to copy not full but like xMB's of file to local
>>> drive?
>>> >
>>> > Is something like this possible
>>> > Thanks
>>> > Jamal
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>



--
Harsh J

Re: copy chunk of hadoop output

Posted by Harsh J <ha...@cloudera.com>.

Hi JM,

I am not sure how "dangerous" it is, since we're using a pipe here,
and as you yourself note, it will only last as long as the last bytes
have been got and then terminate.

The -cat process will terminate because the
process we're piping to will terminate first after it reaches its goal
of -c <N bytes>; so certainly the "-cat" program will not fetch the
whole file down but it may fetch a few bytes extra over communication
due to use of read buffers (the extra data won't be put into the target
file, and get discarded).

We can try it out and observe the "clienttrace" logged
at the DN at the end of the -cat's read. Here's an example:

I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes"
below, its ~1.58 MB:

2013-02-20 23:55:19,777 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op:
HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,
srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
duration: 192289000

I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to
store first 5 bytes onto a local file:

Asserting that post command we get 5 bytes:
➜  ~ wc -c foo.xml
       5 foo.xml

Asserting that DN didn't IO-read the whole file, see the read op below
and its "bytes" parameter, its only about 193 KB, not the whole block
of 1.58 MB we wrote earlier:

2013-02-21 00:01:32,437 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op:
HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,
srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
duration: 19207000

I don't see how this is anymore dangerous than doing a
-copyToLocal/-get, which retrieves the whole file anyway?

On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
<je...@spaggiari.org> wrote:
> But be careful.
>
> hadoop fs -cat will retrieve the entire file and last only when it
> will have retrieve the last bytes you are looking for.
>
> If your file is many GB big, it will take a lot of time for this
> command to complete and will put some pressure on your network.
>
> JM
>
> 2013/2/19, jamal sasha <ja...@gmail.com>:
>> Awesome thanks :)
>>
>>
>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> You can instead use 'fs -cat' and the 'head' coreutil, as one example:
>>>
>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>>>
>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <ja...@gmail.com>
>>> wrote:
>>> > Hi,
>>> >   I was wondering in the following command:
>>> >
>>> > bin/hadoop dfs -copyToLocal hdfspath localpath
>>> > can we have specify to copy not full but like xMB's of file to local
>>> drive?
>>> >
>>> > Is something like this possible
>>> > Thanks
>>> > Jamal
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>



--
Harsh J

Re: copy chunk of hadoop output

Posted by Harsh J <ha...@cloudera.com>.

Hi JM,

I am not sure how "dangerous" it is, since we're using a pipe here,
and as you yourself note, it will only last as long as the last bytes
have been got and then terminate.

The -cat process will terminate because the
process we're piping to will terminate first after it reaches its goal
of -c <N bytes>; so certainly the "-cat" program will not fetch the
whole file down but it may fetch a few bytes extra over communication
due to use of read buffers (the extra data won't be put into the target
file, and get discarded).

We can try it out and observe the "clienttrace" logged
at the DN at the end of the -cat's read. Here's an example:

I wrote a 1.6~ MB file into a file called "foo.jar", see "bytes"
below, its ~1.58 MB:

2013-02-20 23:55:19,777 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/127.0.0.1:58785, dest: /127.0.0.1:50010, bytes: 1658314, op:
HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_915204057_1, offset: 0,
srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
duration: 192289000

I ran the command "hadoop fs -cat foo.jar | head -c 5 > foo.xml" to
store first 5 bytes onto a local file:

Asserting that post command we get 5 bytes:
➜  ~ wc -c foo.xml
       5 foo.xml

Asserting that DN didn't IO-read the whole file, see the read op below
and its "bytes" parameter, its only about 193 KB, not the whole block
of 1.58 MB we wrote earlier:

2013-02-21 00:01:32,437 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/127.0.0.1:50010, dest: /127.0.0.1:58802, bytes: 198144, op:
HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1698829178_1, offset: 0,
srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
duration: 19207000

I don't see how this is anymore dangerous than doing a
-copyToLocal/-get, which retrieves the whole file anyway?

On Wed, Feb 20, 2013 at 9:25 PM, Jean-Marc Spaggiari
<je...@spaggiari.org> wrote:
> But be careful.
>
> hadoop fs -cat will retrieve the entire file and last only when it
> will have retrieve the last bytes you are looking for.
>
> If your file is many GB big, it will take a lot of time for this
> command to complete and will put some pressure on your network.
>
> JM
>
> 2013/2/19, jamal sasha <ja...@gmail.com>:
>> Awesome thanks :)
>>
>>
>> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> You can instead use 'fs -cat' and the 'head' coreutil, as one example:
>>>
>>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>>>
>>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <ja...@gmail.com>
>>> wrote:
>>> > Hi,
>>> >   I was wondering in the following command:
>>> >
>>> > bin/hadoop dfs -copyToLocal hdfspath localpath
>>> > can we have specify to copy not full but like xMB's of file to local
>>> drive?
>>> >
>>> > Is something like this possible
>>> > Thanks
>>> > Jamal
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>



--
Harsh J

Re: copy chunk of hadoop output

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

But be careful.

hadoop fs -cat will retrieve the entire file and last only when it
will have retrieve the last bytes you are looking for.

If your file is many GB big, it will take a lot of time for this
command to complete and will put some pressure on your network.

JM

2013/2/19, jamal sasha <ja...@gmail.com>:
> Awesome thanks :)
>
>
> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> You can instead use 'fs -cat' and the 'head' coreutil, as one example:
>>
>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>>
>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <ja...@gmail.com>
>> wrote:
>> > Hi,
>> >   I was wondering in the following command:
>> >
>> > bin/hadoop dfs -copyToLocal hdfspath localpath
>> > can we have specify to copy not full but like xMB's of file to local
>> drive?
>> >
>> > Is something like this possible
>> > Thanks
>> > Jamal
>>
>>
>>
>> --
>> Harsh J
>>
>

Re: copy chunk of hadoop output

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

But be careful.

hadoop fs -cat will retrieve the entire file and last only when it
will have retrieve the last bytes you are looking for.

If your file is many GB big, it will take a lot of time for this
command to complete and will put some pressure on your network.

JM

2013/2/19, jamal sasha <ja...@gmail.com>:
> Awesome thanks :)
>
>
> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> You can instead use 'fs -cat' and the 'head' coreutil, as one example:
>>
>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>>
>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <ja...@gmail.com>
>> wrote:
>> > Hi,
>> >   I was wondering in the following command:
>> >
>> > bin/hadoop dfs -copyToLocal hdfspath localpath
>> > can we have specify to copy not full but like xMB's of file to local
>> drive?
>> >
>> > Is something like this possible
>> > Thanks
>> > Jamal
>>
>>
>>
>> --
>> Harsh J
>>
>

Re: copy chunk of hadoop output

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

But be careful.

hadoop fs -cat will retrieve the entire file and last only when it
will have retrieve the last bytes you are looking for.

If your file is many GB big, it will take a lot of time for this
command to complete and will put some pressure on your network.

JM

2013/2/19, jamal sasha <ja...@gmail.com>:
> Awesome thanks :)
>
>
> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> You can instead use 'fs -cat' and the 'head' coreutil, as one example:
>>
>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>>
>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <ja...@gmail.com>
>> wrote:
>> > Hi,
>> >   I was wondering in the following command:
>> >
>> > bin/hadoop dfs -copyToLocal hdfspath localpath
>> > can we have specify to copy not full but like xMB's of file to local
>> drive?
>> >
>> > Is something like this possible
>> > Thanks
>> > Jamal
>>
>>
>>
>> --
>> Harsh J
>>
>

Re: copy chunk of hadoop output

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

But be careful.

hadoop fs -cat will retrieve the entire file and last only when it
will have retrieve the last bytes you are looking for.

If your file is many GB big, it will take a lot of time for this
command to complete and will put some pressure on your network.

JM

2013/2/19, jamal sasha <ja...@gmail.com>:
> Awesome thanks :)
>
>
> On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> You can instead use 'fs -cat' and the 'head' coreutil, as one example:
>>
>> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>>
>> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <ja...@gmail.com>
>> wrote:
>> > Hi,
>> >   I was wondering in the following command:
>> >
>> > bin/hadoop dfs -copyToLocal hdfspath localpath
>> > can we have specify to copy not full but like xMB's of file to local
>> drive?
>> >
>> > Is something like this possible
>> > Thanks
>> > Jamal
>>
>>
>>
>> --
>> Harsh J
>>
>

Re: copy chunk of hadoop output

Posted by jamal sasha <ja...@gmail.com>.

Awesome thanks :)


On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com> wrote:

> You can instead use 'fs -cat' and the 'head' coreutil, as one example:
>
> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>
> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <ja...@gmail.com>
> wrote:
> > Hi,
> >   I was wondering in the following command:
> >
> > bin/hadoop dfs -copyToLocal hdfspath localpath
> > can we have specify to copy not full but like xMB's of file to local
> drive?
> >
> > Is something like this possible
> > Thanks
> > Jamal
>
>
>
> --
> Harsh J
>

Re: copy chunk of hadoop output

Posted by jamal sasha <ja...@gmail.com>.

Awesome thanks :)


On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com> wrote:

> You can instead use 'fs -cat' and the 'head' coreutil, as one example:
>
> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>
> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <ja...@gmail.com>
> wrote:
> > Hi,
> >   I was wondering in the following command:
> >
> > bin/hadoop dfs -copyToLocal hdfspath localpath
> > can we have specify to copy not full but like xMB's of file to local
> drive?
> >
> > Is something like this possible
> > Thanks
> > Jamal
>
>
>
> --
> Harsh J
>

Re: copy chunk of hadoop output

Posted by jamal sasha <ja...@gmail.com>.

Awesome thanks :)


On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com> wrote:

> You can instead use 'fs -cat' and the 'head' coreutil, as one example:
>
> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>
> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <ja...@gmail.com>
> wrote:
> > Hi,
> >   I was wondering in the following command:
> >
> > bin/hadoop dfs -copyToLocal hdfspath localpath
> > can we have specify to copy not full but like xMB's of file to local
> drive?
> >
> > Is something like this possible
> > Thanks
> > Jamal
>
>
>
> --
> Harsh J
>

Re: copy chunk of hadoop output

Posted by jamal sasha <ja...@gmail.com>.

Awesome thanks :)


On Tue, Feb 19, 2013 at 2:14 PM, Harsh J <ha...@cloudera.com> wrote:

> You can instead use 'fs -cat' and the 'head' coreutil, as one example:
>
> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>
> On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <ja...@gmail.com>
> wrote:
> > Hi,
> >   I was wondering in the following command:
> >
> > bin/hadoop dfs -copyToLocal hdfspath localpath
> > can we have specify to copy not full but like xMB's of file to local
> drive?
> >
> > Is something like this possible
> > Thanks
> > Jamal
>
>
>
> --
> Harsh J
>

Re: copy chunk of hadoop output

Posted by Harsh J <ha...@cloudera.com>.

You can instead use 'fs -cat' and the 'head' coreutil, as one example:

hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file

On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <ja...@gmail.com> wrote:
> Hi,
>   I was wondering in the following command:
>
> bin/hadoop dfs -copyToLocal hdfspath localpath
> can we have specify to copy not full but like xMB's of file to local drive?
>
> Is something like this possible
> Thanks
> Jamal



--
Harsh J

Re: copy chunk of hadoop output

Posted by Harsh J <ha...@cloudera.com>.

You can instead use 'fs -cat' and the 'head' coreutil, as one example:

hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file

On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <ja...@gmail.com> wrote:
> Hi,
>   I was wondering in the following command:
>
> bin/hadoop dfs -copyToLocal hdfspath localpath
> can we have specify to copy not full but like xMB's of file to local drive?
>
> Is something like this possible
> Thanks
> Jamal



--
Harsh J

Re: copy chunk of hadoop output

Posted by Harsh J <ha...@cloudera.com>.

You can instead use 'fs -cat' and the 'head' coreutil, as one example:

hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file

On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <ja...@gmail.com> wrote:
> Hi,
>   I was wondering in the following command:
>
> bin/hadoop dfs -copyToLocal hdfspath localpath
> can we have specify to copy not full but like xMB's of file to local drive?
>
> Is something like this possible
> Thanks
> Jamal



--
Harsh J

Re: copy chunk of hadoop output

Posted by Harsh J <ha...@cloudera.com>.

You can instead use 'fs -cat' and the 'head' coreutil, as one example:

hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file

On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha <ja...@gmail.com> wrote:
> Hi,
>   I was wondering in the following command:
>
> bin/hadoop dfs -copyToLocal hdfspath localpath
> can we have specify to copy not full but like xMB's of file to local drive?
>
> Is something like this possible
> Thanks
> Jamal



--
Harsh J