You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Alex Parvulescu <al...@gmail.com> on 2010/03/16 10:59:08 UTC
Output 'hadoop dfs -get' to stdout
Hello,
Is there a reason for which 'hadoop dfs -get' will not output to stdout?
I see 'hadoop dfs -put' can handle stdin. It would seem that dfs would have
to also support outputing to stdout.
thanks,
alex
Re: Output 'hadoop dfs -get' to stdout
Posted by Alex Parvulescu <al...@gmail.com>.
Hello Olivier
It works like a charm :)
While we are on the subject, I've sent an email to
common-user@hadoop.apache.org about hdfs that remained unanswered. I'll
reproduce that here, I think it's a better place for it:
I want to achieve the 'hadoop dfs -getmerge' functionality over http. The
closest I could find is the 'Download this file' link but this is available
only for parts, not the whole directory (
http://hadoop:50075/streamFile?filename=%2Fuser%2Fhadoop-user%2Foutput%2Fsolr%2F%2Fpart-00000
)
It seems that you can push to Solr 1.4 a csv url file. That is a link to the
actual csv file. The problem is that a directory is not available for
download as a merged file, in the hadoop hdfs over http interface, just the
individual parts.
As all the pieces are already there, it doesn't make sense to me to add a
http (Apache?) server to this mix just to serve the processed files. I
should be able to do that with a special url or something, maybe along the
lines of ... /streamMergedFile?whateverPathToAFileOrDir
As you can see it's related to my initial question on this thread :)
thanks for your time,
alex
On Tue, Mar 16, 2010 at 4:52 PM, Varene Olivier <va...@echo.fr> wrote:
>
> Supposing you do have your part-r-XXXX fully ordered
>
> you can do
>
> hadoop dfs -cat "output/solr/part-*" > yourLocalFile
>
> tada :)
>
> Cheers
>
> Olivier
>
>
> Alex Parvulescu a écrit :
>
>> Hello,
>>
>> one minor correction.
>>
>> I'm talking about 'hadoop dfs -getmerge' . You are right, '-cat' is the
>> equivalent of '-get' and they both handle only files.
>>
>> I'd like to see an equivalent of 'getmerge' to stdout.
>>
>> sorry for the confusion
>> alex
>>
>> On Tue, Mar 16, 2010 at 11:31 AM, Alex Parvulescu <
>> alex.parvulescu@gmail.com <ma...@gmail.com>> wrote:
>>
>> Hello Olivier,
>>
>> I've tried 'cat'. This is the error I get: 'cat: Source must be a
>> file.'
>> This happens when I try to get all parts from a directory as a
>> single .csv file.
>>
>> Something like that:
>> hadoop dfs -cat hdfs://master:54310/user/hadoop-user/output/solr/
>> cat: Source must be a file.
>> This is what the dir looks like
>> hadoop dfs -ls hdfs://master:54310/user/hadoop-user/output/solr/
>> Found 3 items
>> drwxr-xr-x - hadoop supergroup 0 2010-03-12 16:36
>> /user/hadoop-user/output/solr/_logs
>> -rw-r--r-- 2 hadoop supergroup 64882566 2010-03-12 16:36
>> /user/hadoop-user/output/solr/part-00000
>> -rw-r--r-- 2 hadoop supergroup 51388943 2010-03-12 16:36
>> /user/hadoop-user/output/solr/part-00001
>>
>> It seems -get can merge everything to one file, but cannot output to
>> sdtout while 'cat' can do stdout, but it seems I have to fetch the
>> parts one by one.
>>
>> Or am I missing something?
>>
>> thanks,
>> alex
>>
>>
>> On Tue, Mar 16, 2010 at 11:28 AM, Varene Olivier <varene@echo.fr
>> <ma...@echo.fr>> wrote:
>>
>> Hello Alex,
>>
>> get writes down a file on your FileSystem
>>
>> hadoop dfs [-get [-ignoreCrc] [-crc] <src> <localdst>]
>>
>> with
>> src : your file in your hdfs
>> localdst : the name of the file with the collected data (from
>> src) on
>> your local filesystem
>>
>>
>> To get the results to STDOUT,
>> you can use cat
>>
>> hadoop dfs [-cat <src>]
>>
>> with src : your file in your hdfs
>>
>> Regards
>> Olivier
>>
>> Alex Parvulescu a écrit :
>>
>> Hello,
>>
>> Is there a reason for which 'hadoop dfs -get' will not
>> output to stdout?
>>
>> I see 'hadoop dfs -put' can handle stdin. It would seem
>> that dfs would have to also support outputing to stdout.
>>
>>
>> thanks,
>> alex
>>
>>
>>
>>
>>
>>
Re: Output 'hadoop dfs -get' to stdout
Posted by Varene Olivier <va...@echo.fr>.
Supposing you do have your part-r-XXXX fully ordered
you can do
hadoop dfs -cat "output/solr/part-*" > yourLocalFile
tada :)
Cheers
Olivier
Alex Parvulescu a écrit :
> Hello,
>
> one minor correction.
>
> I'm talking about 'hadoop dfs -getmerge' . You are right, '-cat' is the
> equivalent of '-get' and they both handle only files.
>
> I'd like to see an equivalent of 'getmerge' to stdout.
>
> sorry for the confusion
> alex
>
> On Tue, Mar 16, 2010 at 11:31 AM, Alex Parvulescu
> <alex.parvulescu@gmail.com <ma...@gmail.com>> wrote:
>
> Hello Olivier,
>
> I've tried 'cat'. This is the error I get: 'cat: Source must be a file.'
> This happens when I try to get all parts from a directory as a
> single .csv file.
>
> Something like that:
> hadoop dfs -cat hdfs://master:54310/user/hadoop-user/output/solr/
> cat: Source must be a file.
>
> This is what the dir looks like
> hadoop dfs -ls hdfs://master:54310/user/hadoop-user/output/solr/
> Found 3 items
> drwxr-xr-x - hadoop supergroup 0 2010-03-12 16:36
> /user/hadoop-user/output/solr/_logs
> -rw-r--r-- 2 hadoop supergroup 64882566 2010-03-12 16:36
> /user/hadoop-user/output/solr/part-00000
> -rw-r--r-- 2 hadoop supergroup 51388943 2010-03-12 16:36
> /user/hadoop-user/output/solr/part-00001
>
> It seems -get can merge everything to one file, but cannot output to
> sdtout while 'cat' can do stdout, but it seems I have to fetch the
> parts one by one.
>
> Or am I missing something?
>
> thanks,
> alex
>
>
> On Tue, Mar 16, 2010 at 11:28 AM, Varene Olivier <varene@echo.fr
> <ma...@echo.fr>> wrote:
>
> Hello Alex,
>
> get writes down a file on your FileSystem
>
> hadoop dfs [-get [-ignoreCrc] [-crc] <src> <localdst>]
>
> with
> src : your file in your hdfs
> localdst : the name of the file with the collected data (from
> src) on
> your local filesystem
>
>
> To get the results to STDOUT,
> you can use cat
>
> hadoop dfs [-cat <src>]
>
> with src : your file in your hdfs
>
> Regards
> Olivier
>
> Alex Parvulescu a écrit :
>
> Hello,
>
> Is there a reason for which 'hadoop dfs -get' will not
> output to stdout?
>
> I see 'hadoop dfs -put' can handle stdin. It would seem
> that dfs would have to also support outputing to stdout.
>
>
> thanks,
> alex
>
>
>
>
>
Re: Output 'hadoop dfs -get' to stdout
Posted by Alex Parvulescu <al...@gmail.com>.
Hello,
one minor correction.
I'm talking about 'hadoop dfs -getmerge' . You are right, '-cat' is the
equivalent of '-get' and they both handle only files.
I'd like to see an equivalent of 'getmerge' to stdout.
sorry for the confusion
alex
On Tue, Mar 16, 2010 at 11:31 AM, Alex Parvulescu <alex.parvulescu@gmail.com
> wrote:
> Hello Olivier,
>
> I've tried 'cat'. This is the error I get: 'cat: Source must be a file.'
> This happens when I try to get all parts from a directory as a single .csv
> file.
>
> Something like that:
> hadoop dfs -cat hdfs://master:54310/user/hadoop-user/output/solr/
> cat: Source must be a file.
>
> This is what the dir looks like
> hadoop dfs -ls hdfs://master:54310/user/hadoop-user/output/solr/
> Found 3 items
> drwxr-xr-x - hadoop supergroup 0 2010-03-12 16:36
> /user/hadoop-user/output/solr/_logs
> -rw-r--r-- 2 hadoop supergroup 64882566 2010-03-12 16:36
> /user/hadoop-user/output/solr/part-00000
> -rw-r--r-- 2 hadoop supergroup 51388943 2010-03-12 16:36
> /user/hadoop-user/output/solr/part-00001
>
> It seems -get can merge everything to one file, but cannot output to sdtout
> while 'cat' can do stdout, but it seems I have to fetch the parts one by
> one.
>
> Or am I missing something?
>
> thanks,
> alex
>
>
> On Tue, Mar 16, 2010 at 11:28 AM, Varene Olivier <va...@echo.fr> wrote:
>
>> Hello Alex,
>>
>> get writes down a file on your FileSystem
>>
>> hadoop dfs [-get [-ignoreCrc] [-crc] <src> <localdst>]
>>
>> with
>> src : your file in your hdfs
>> localdst : the name of the file with the collected data (from src) on
>> your local filesystem
>>
>>
>> To get the results to STDOUT,
>> you can use cat
>>
>> hadoop dfs [-cat <src>]
>>
>> with src : your file in your hdfs
>>
>> Regards
>> Olivier
>>
>> Alex Parvulescu a écrit :
>>
>> Hello,
>>>
>>> Is there a reason for which 'hadoop dfs -get' will not output to stdout?
>>>
>>> I see 'hadoop dfs -put' can handle stdin. It would seem that dfs would
>>> have to also support outputing to stdout.
>>>
>>>
>>> thanks,
>>> alex
>>>
>>>
>>>
>>>
>
Re: Output 'hadoop dfs -get' to stdout
Posted by Alex Parvulescu <al...@gmail.com>.
Hello Olivier,
I've tried 'cat'. This is the error I get: 'cat: Source must be a file.'
This happens when I try to get all parts from a directory as a single .csv
file.
Something like that:
hadoop dfs -cat hdfs://master:54310/user/hadoop-user/output/solr/
cat: Source must be a file.
This is what the dir looks like
hadoop dfs -ls hdfs://master:54310/user/hadoop-user/output/solr/
Found 3 items
drwxr-xr-x - hadoop supergroup 0 2010-03-12 16:36
/user/hadoop-user/output/solr/_logs
-rw-r--r-- 2 hadoop supergroup 64882566 2010-03-12 16:36
/user/hadoop-user/output/solr/part-00000
-rw-r--r-- 2 hadoop supergroup 51388943 2010-03-12 16:36
/user/hadoop-user/output/solr/part-00001
It seems -get can merge everything to one file, but cannot output to sdtout
while 'cat' can do stdout, but it seems I have to fetch the parts one by
one.
Or am I missing something?
thanks,
alex
On Tue, Mar 16, 2010 at 11:28 AM, Varene Olivier <va...@echo.fr> wrote:
> Hello Alex,
>
> get writes down a file on your FileSystem
>
> hadoop dfs [-get [-ignoreCrc] [-crc] <src> <localdst>]
>
> with
> src : your file in your hdfs
> localdst : the name of the file with the collected data (from src) on
> your local filesystem
>
>
> To get the results to STDOUT,
> you can use cat
>
> hadoop dfs [-cat <src>]
>
> with src : your file in your hdfs
>
> Regards
> Olivier
>
> Alex Parvulescu a écrit :
>
> Hello,
>>
>> Is there a reason for which 'hadoop dfs -get' will not output to stdout?
>>
>> I see 'hadoop dfs -put' can handle stdin. It would seem that dfs would
>> have to also support outputing to stdout.
>>
>>
>> thanks,
>> alex
>>
>>
>>
>>
Re: Output 'hadoop dfs -get' to stdout
Posted by Varene Olivier <va...@echo.fr>.
Hello Alex,
get writes down a file on your FileSystem
hadoop dfs [-get [-ignoreCrc] [-crc] <src> <localdst>]
with
src : your file in your hdfs
localdst : the name of the file with the collected data (from src) on
your local filesystem
To get the results to STDOUT,
you can use cat
hadoop dfs [-cat <src>]
with src : your file in your hdfs
Regards
Olivier
Alex Parvulescu a écrit :
> Hello,
>
> Is there a reason for which 'hadoop dfs -get' will not output to stdout?
>
> I see 'hadoop dfs -put' can handle stdin. It would seem that dfs would
> have to also support outputing to stdout.
>
>
> thanks,
> alex
>
>
>