You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Alex Parvulescu <al...@gmail.com> on 2010/03/16 10:59:08 UTC

Output 'hadoop dfs -get' to stdout

Hello,

Is there a reason for which 'hadoop dfs -get' will not output to stdout?

I see 'hadoop dfs -put' can handle stdin.  It would seem that dfs would have
to also support outputing to stdout.


thanks,
alex

Re: Output 'hadoop dfs -get' to stdout

Posted by Alex Parvulescu <al...@gmail.com>.
Hello Olivier

It works like a charm :)

While we are on the subject, I've sent an email to
common-user@hadoop.apache.org about hdfs that remained unanswered. I'll
reproduce that here, I think it's a better place for it:

I want to achieve the 'hadoop dfs -getmerge' functionality over http. The
closest I could find is the 'Download this file' link but this is available
only for parts, not the whole directory (
http://hadoop:50075/streamFile?filename=%2Fuser%2Fhadoop-user%2Foutput%2Fsolr%2F%2Fpart-00000
)

It seems that you can push to Solr 1.4 a csv url file. That is a link to the
actual csv file. The problem is that a directory is not available for
download as a merged file, in the hadoop hdfs over http interface, just the
individual parts.

As all the pieces are already there, it doesn't make sense to me to add a
http (Apache?) server to this mix just to serve the processed files. I
should be able to do that with a special url or something, maybe along the
lines of ... /streamMergedFile?whateverPathToAFileOrDir

As you can see it's related to my initial question on this thread :)

thanks for your time,
alex

On Tue, Mar 16, 2010 at 4:52 PM, Varene Olivier <va...@echo.fr> wrote:

>
> Supposing you do have your part-r-XXXX fully ordered
>
> you can do
>
> hadoop dfs -cat "output/solr/part-*" > yourLocalFile
>
> tada :)
>
> Cheers
>
> Olivier
>
>
> Alex Parvulescu a écrit :
>
>> Hello,
>>
>> one minor correction.
>>
>> I'm talking about 'hadoop dfs -getmerge' . You are right, '-cat' is the
>> equivalent of '-get' and they both handle only files.
>>
>> I'd like to see an equivalent of 'getmerge' to stdout.
>>
>> sorry for the confusion
>> alex
>>
>> On Tue, Mar 16, 2010 at 11:31 AM, Alex Parvulescu <
>> alex.parvulescu@gmail.com <ma...@gmail.com>> wrote:
>>
>>    Hello Olivier,
>>
>>    I've tried 'cat'. This is the error I get: 'cat: Source must be a
>> file.'
>>    This happens when I try to get all parts from a directory as a
>>    single .csv file.
>>
>>    Something like that:
>>      hadoop dfs -cat hdfs://master:54310/user/hadoop-user/output/solr/
>>      cat: Source must be a file.
>>         This is what the dir looks like
>>      hadoop dfs -ls hdfs://master:54310/user/hadoop-user/output/solr/
>>      Found 3 items
>>      drwxr-xr-x   - hadoop supergroup          0 2010-03-12 16:36
>>    /user/hadoop-user/output/solr/_logs
>>      -rw-r--r--   2 hadoop supergroup   64882566 2010-03-12 16:36
>>    /user/hadoop-user/output/solr/part-00000
>>      -rw-r--r--   2 hadoop supergroup   51388943 2010-03-12 16:36
>>    /user/hadoop-user/output/solr/part-00001
>>
>>    It seems -get can merge everything to one file, but cannot output to
>>    sdtout while 'cat' can do stdout, but it seems I have to fetch the
>>    parts one by one.
>>
>>    Or am I missing something?
>>
>>    thanks,
>>    alex
>>
>>
>>    On Tue, Mar 16, 2010 at 11:28 AM, Varene Olivier <varene@echo.fr
>>    <ma...@echo.fr>> wrote:
>>
>>        Hello Alex,
>>
>>        get writes down a file on your FileSystem
>>
>>        hadoop dfs [-get [-ignoreCrc] [-crc] <src> <localdst>]
>>
>>        with
>>         src : your file in your hdfs
>>         localdst : the name of the file with the collected data (from
>>        src) on
>>            your local filesystem
>>
>>
>>        To get the results to STDOUT,
>>        you can use cat
>>
>>        hadoop dfs [-cat <src>]
>>
>>        with src : your file in your hdfs
>>
>>        Regards
>>        Olivier
>>
>>        Alex Parvulescu a écrit :
>>
>>            Hello,
>>
>>            Is there a reason for which 'hadoop dfs -get' will not
>>            output to stdout?
>>
>>            I see 'hadoop dfs -put' can handle stdin.  It would seem
>>            that dfs would have to also support outputing to stdout.
>>
>>
>>            thanks,
>>            alex
>>
>>
>>
>>
>>
>>

Re: Output 'hadoop dfs -get' to stdout

Posted by Varene Olivier <va...@echo.fr>.
Supposing you do have your part-r-XXXX fully ordered

you can do

hadoop dfs -cat "output/solr/part-*" > yourLocalFile

tada :)

Cheers
Olivier


Alex Parvulescu a écrit :
> Hello,
> 
> one minor correction.
> 
> I'm talking about 'hadoop dfs -getmerge' . You are right, '-cat' is the 
> equivalent of '-get' and they both handle only files.
> 
> I'd like to see an equivalent of 'getmerge' to stdout.
> 
> sorry for the confusion
> alex
> 
> On Tue, Mar 16, 2010 at 11:31 AM, Alex Parvulescu 
> <alex.parvulescu@gmail.com <ma...@gmail.com>> wrote:
> 
>     Hello Olivier,
> 
>     I've tried 'cat'. This is the error I get: 'cat: Source must be a file.'
>     This happens when I try to get all parts from a directory as a
>     single .csv file.
> 
>     Something like that:
>       hadoop dfs -cat hdfs://master:54310/user/hadoop-user/output/solr/
>       cat: Source must be a file.
>      
>     This is what the dir looks like
>       hadoop dfs -ls hdfs://master:54310/user/hadoop-user/output/solr/
>       Found 3 items
>       drwxr-xr-x   - hadoop supergroup          0 2010-03-12 16:36
>     /user/hadoop-user/output/solr/_logs
>       -rw-r--r--   2 hadoop supergroup   64882566 2010-03-12 16:36
>     /user/hadoop-user/output/solr/part-00000
>       -rw-r--r--   2 hadoop supergroup   51388943 2010-03-12 16:36
>     /user/hadoop-user/output/solr/part-00001
> 
>     It seems -get can merge everything to one file, but cannot output to
>     sdtout while 'cat' can do stdout, but it seems I have to fetch the
>     parts one by one.
> 
>     Or am I missing something?
> 
>     thanks,
>     alex
> 
> 
>     On Tue, Mar 16, 2010 at 11:28 AM, Varene Olivier <varene@echo.fr
>     <ma...@echo.fr>> wrote:
> 
>         Hello Alex,
> 
>         get writes down a file on your FileSystem
> 
>         hadoop dfs [-get [-ignoreCrc] [-crc] <src> <localdst>]
> 
>         with
>          src : your file in your hdfs
>          localdst : the name of the file with the collected data (from
>         src) on
>             your local filesystem
> 
> 
>         To get the results to STDOUT,
>         you can use cat
> 
>         hadoop dfs [-cat <src>]
> 
>         with src : your file in your hdfs
> 
>         Regards
>         Olivier
> 
>         Alex Parvulescu a écrit :
> 
>             Hello,
> 
>             Is there a reason for which 'hadoop dfs -get' will not
>             output to stdout?
> 
>             I see 'hadoop dfs -put' can handle stdin.  It would seem
>             that dfs would have to also support outputing to stdout.
> 
> 
>             thanks,
>             alex
> 
> 
> 
> 
> 

Re: Output 'hadoop dfs -get' to stdout

Posted by Alex Parvulescu <al...@gmail.com>.
Hello,

one minor correction.

I'm talking about 'hadoop dfs -getmerge' . You are right, '-cat' is the
equivalent of '-get' and they both handle only files.

I'd like to see an equivalent of 'getmerge' to stdout.

sorry for the confusion
alex

On Tue, Mar 16, 2010 at 11:31 AM, Alex Parvulescu <alex.parvulescu@gmail.com
> wrote:

> Hello Olivier,
>
> I've tried 'cat'. This is the error I get: 'cat: Source must be a file.'
> This happens when I try to get all parts from a directory as a single .csv
> file.
>
> Something like that:
>   hadoop dfs -cat hdfs://master:54310/user/hadoop-user/output/solr/
>   cat: Source must be a file.
>
> This is what the dir looks like
>   hadoop dfs -ls hdfs://master:54310/user/hadoop-user/output/solr/
>   Found 3 items
>   drwxr-xr-x   - hadoop supergroup          0 2010-03-12 16:36
> /user/hadoop-user/output/solr/_logs
>   -rw-r--r--   2 hadoop supergroup   64882566 2010-03-12 16:36
> /user/hadoop-user/output/solr/part-00000
>   -rw-r--r--   2 hadoop supergroup   51388943 2010-03-12 16:36
> /user/hadoop-user/output/solr/part-00001
>
> It seems -get can merge everything to one file, but cannot output to sdtout
> while 'cat' can do stdout, but it seems I have to fetch the parts one by
> one.
>
> Or am I missing something?
>
> thanks,
> alex
>
>
> On Tue, Mar 16, 2010 at 11:28 AM, Varene Olivier <va...@echo.fr> wrote:
>
>> Hello Alex,
>>
>> get writes down a file on your FileSystem
>>
>> hadoop dfs [-get [-ignoreCrc] [-crc] <src> <localdst>]
>>
>> with
>>  src : your file in your hdfs
>>  localdst : the name of the file with the collected data (from src) on
>>     your local filesystem
>>
>>
>> To get the results to STDOUT,
>> you can use cat
>>
>> hadoop dfs [-cat <src>]
>>
>> with src : your file in your hdfs
>>
>> Regards
>> Olivier
>>
>> Alex Parvulescu a écrit :
>>
>>  Hello,
>>>
>>> Is there a reason for which 'hadoop dfs -get' will not output to stdout?
>>>
>>> I see 'hadoop dfs -put' can handle stdin.  It would seem that dfs would
>>> have to also support outputing to stdout.
>>>
>>>
>>> thanks,
>>> alex
>>>
>>>
>>>
>>>
>

Re: Output 'hadoop dfs -get' to stdout

Posted by Alex Parvulescu <al...@gmail.com>.
Hello Olivier,

I've tried 'cat'. This is the error I get: 'cat: Source must be a file.'
This happens when I try to get all parts from a directory as a single .csv
file.

Something like that:
  hadoop dfs -cat hdfs://master:54310/user/hadoop-user/output/solr/
  cat: Source must be a file.

This is what the dir looks like
  hadoop dfs -ls hdfs://master:54310/user/hadoop-user/output/solr/
  Found 3 items
  drwxr-xr-x   - hadoop supergroup          0 2010-03-12 16:36
/user/hadoop-user/output/solr/_logs
  -rw-r--r--   2 hadoop supergroup   64882566 2010-03-12 16:36
/user/hadoop-user/output/solr/part-00000
  -rw-r--r--   2 hadoop supergroup   51388943 2010-03-12 16:36
/user/hadoop-user/output/solr/part-00001

It seems -get can merge everything to one file, but cannot output to sdtout
while 'cat' can do stdout, but it seems I have to fetch the parts one by
one.

Or am I missing something?

thanks,
alex

On Tue, Mar 16, 2010 at 11:28 AM, Varene Olivier <va...@echo.fr> wrote:

> Hello Alex,
>
> get writes down a file on your FileSystem
>
> hadoop dfs [-get [-ignoreCrc] [-crc] <src> <localdst>]
>
> with
>  src : your file in your hdfs
>  localdst : the name of the file with the collected data (from src) on
>     your local filesystem
>
>
> To get the results to STDOUT,
> you can use cat
>
> hadoop dfs [-cat <src>]
>
> with src : your file in your hdfs
>
> Regards
> Olivier
>
> Alex Parvulescu a écrit :
>
>  Hello,
>>
>> Is there a reason for which 'hadoop dfs -get' will not output to stdout?
>>
>> I see 'hadoop dfs -put' can handle stdin.  It would seem that dfs would
>> have to also support outputing to stdout.
>>
>>
>> thanks,
>> alex
>>
>>
>>
>>

Re: Output 'hadoop dfs -get' to stdout

Posted by Varene Olivier <va...@echo.fr>.
Hello Alex,

get writes down a file on your FileSystem

hadoop dfs [-get [-ignoreCrc] [-crc] <src> <localdst>]

with
   src : your file in your hdfs
   localdst : the name of the file with the collected data (from src) on 

      your local filesystem


To get the results to STDOUT,
you can use cat

hadoop dfs [-cat <src>]

with src : your file in your hdfs

Regards
Olivier

Alex Parvulescu a écrit :
> Hello,
> 
> Is there a reason for which 'hadoop dfs -get' will not output to stdout?
> 
> I see 'hadoop dfs -put' can handle stdin.  It would seem that dfs would 
> have to also support outputing to stdout.
> 
> 
> thanks,
> alex
> 
> 
>