You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Jean <la...@yahoo.fr> on 2014/12/19 22:59:28 UTC

Fast grep on hdfs files

Hello,
I want to be able to grep customs strings in lot of files stored in hdfs.
I have at least a size of 500GB-2TB to grep splitted in ~50-200 files.

What would be the best way to have the faster results : 
- lines matching 
- filenames containing the lines matched

I tested with a map reduce grep but it's slow for interactive user.

Do i need to index  everything in hive,solr ?
Spark will be faster than mapreduce ?


Thanks