You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Steven Cullens <sr...@gmail.com> on 2014/03/13 21:12:00 UTC

local file input for seqdirectory

Hi,

I have a large number of files on the order of kilobytes on my local
machine that I want to convert to a sequence file on HDFS.  Whenever, I try
to copy the local files to HDFS, hadoop complains about bad blocks,
presumably because each block is 64mb and there are more files than blocks.
 In mahout 0.7, I would tell it that the input files are local, like:

mahout seqdirectory -i file://<input directory> -o <HDFS directory>

But I can't use the same command on Mahout 0.9, where it expects the file
system to be HDFS.  Is there a workaround to generating the sequence file
using Mahout 0.9?  Thanks.

Steven

Re: local file input for seqdirectory

Posted by Steven Cullens <sr...@gmail.com>.
Thanks, Suneel.


On Thu, Mar 13, 2014 at 4:17 PM, Suneel Marthi <su...@yahoo.com>wrote:

> The workaround is to add -xm sequential. A MR version of seqdirectory was
> introduced in 0.8 and hence the default execution mode is MR if none is
> specified.
>
>
>
>
>
>
> On Thursday, March 13, 2014 4:12 PM, Steven Cullens <sr...@gmail.com>
> wrote:
>
> Hi,
>
> I have a large number of files on the order of kilobytes on my local
> machine that I want to convert to a sequence file on HDFS.  Whenever, I try
> to copy the local files to HDFS, hadoop complains about bad blocks,
> presumably because each block is 64mb and there are more files than blocks.
> In mahout 0.7, I would tell it that the input files are local, like:
>
> mahout seqdirectory -i file://<input directory> -o <HDFS directory>
>
> But I can't use the same command on Mahout 0.9, where it expects the file
> system to be HDFS.  Is there a workaround to generating the sequence file
> using Mahout 0.9?  Thanks.
>
> Steven
>

Re: local file input for seqdirectory

Posted by Suneel Marthi <su...@yahoo.com>.
The workaround is to add -xm sequential. A MR version of seqdirectory was introduced in 0.8 and hence the default execution mode is MR if none is specified.






On Thursday, March 13, 2014 4:12 PM, Steven Cullens <sr...@gmail.com> wrote:
 
Hi,

I have a large number of files on the order of kilobytes on my local
machine that I want to convert to a sequence file on HDFS.  Whenever, I try
to copy the local files to HDFS, hadoop complains about bad blocks,
presumably because each block is 64mb and there are more files than blocks.
In mahout 0.7, I would tell it that the input files are local, like:

mahout seqdirectory -i file://<input directory> -o <HDFS directory>

But I can't use the same command on Mahout 0.9, where it expects the file
system to be HDFS.  Is there a workaround to generating the sequence file
using Mahout 0.9?  Thanks.

Steven