You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Raghuveer <al...@yahoo.com.INVALID> on 2015/03/04 05:52:44 UTC

mahout output of seq2sparse is empty

I have data file of the formatsrc_ip,dest_ip,packet, bytes_transferred, src_port,dest_port, start_timestamp
71.105.62.168, 38.106.70.147, 1, 54, 55704, 52747, 1341775056478
38.106.70.147, 71.105.62.168, 2, 1568, 52747, 55704, 1341775056478
Firstly the text like src_ip should be converted to a number i think as per my reading how can i do this?I ran the following code successfully without any errors:./mahout seqdirectory --input /upload/20120708-0031-0060.csv --output /upload/output1
./mahout seq2sparse -i /upload/output1 -o /upload/output2
but the output is emptydrwxr-xr-x   - admin supergroup     0  /upload/output4/df-count
-rw-r--r--   2 admin supergroup  2008  /upload/output4/dictionary.file-0
-rw-r--r--   2 admin supergroup  1593  /upload/output4/frequency.file-0
drwxr-xr-x   - admin supergroup     0  /upload/output4/tf-vectors
drwxr-xr-x   - admin supergroup     0  /upload/output4/tfidf-vectors
drwxr-xr-x   - admin supergroup     0  /upload/output4/tokenized-documents
drwxr-xr-x   - admin supergroup     0  /upload/output4/wordcount
i should have got something in tfidf-vectors isnt it? Can you kindly suggest what am missing.thanks in advance

Re: mahout output of seq2sparse is empty

Posted by Suneel Marthi <su...@gmail.com>.
Depends on what u r trying to do. Are u trying classification or clustering?

On Wed, Mar 4, 2015 at 1:08 AM, Raghuveer <al...@yahoo.com.invalid>
wrote:

> Yes, you are right its was a directory. I see the part-m-00000 file can
> you kindly suggest me how to run mahout on this file. Should i run
> classification or clustering? Can you please share some sample.
> thanks very much.
>
>      On Wednesday, March 4, 2015 11:06 AM, Andrew Musselman <
> andrew.musselman@gmail.com> wrote:
>
>
>  I don't have a terminal in front of me but are you sure tfidf-vectors is a
> file, not a directory?
>
> On Tuesday, March 3, 2015, Raghuveer <al...@yahoo.com.invalid>
> wrote:
>
> > I have data file of the formatsrc_ip,dest_ip,packet, bytes_transferred,
> > src_port,dest_port, start_timestamp
> > 71.105.62.168, 38.106.70.147, 1, 54, 55704, 52747, 1341775056478
> > 38.106.70.147, 71.105.62.168, 2, 1568, 52747, 55704, 1341775056478
> > Firstly the text like src_ip should be converted to a number i think as
> > per my reading how can i do this?I ran the following code successfully
> > without any errors:./mahout seqdirectory --input
> > /upload/20120708-0031-0060.csv --output /upload/output1
> > ./mahout seq2sparse -i /upload/output1 -o /upload/output2
> > but the output is emptydrwxr-xr-x  - admin supergroup    0
> > /upload/output4/df-count
> > -rw-r--r--  2 admin supergroup  2008  /upload/output4/dictionary.file-0
> > -rw-r--r--  2 admin supergroup  1593  /upload/output4/frequency.file-0
> > drwxr-xr-x  - admin supergroup    0  /upload/output4/tf-vectors
> > drwxr-xr-x  - admin supergroup    0  /upload/output4/tfidf-vectors
> > drwxr-xr-x  - admin supergroup    0  /upload/output4/tokenized-documents
> > drwxr-xr-x  - admin supergroup    0  /upload/output4/wordcount
> > i should have got something in tfidf-vectors isnt it? Can you kindly
> > suggest what am missing.thanks in advance
> >
>
>
>

Re: mahout output of seq2sparse is empty

Posted by Raghuveer <al...@yahoo.com.INVALID>.
Yes, you are right its was a directory. I see the part-m-00000 file can you kindly suggest me how to run mahout on this file. Should i run classification or clustering? Can you please share some sample.
thanks very much. 

     On Wednesday, March 4, 2015 11:06 AM, Andrew Musselman <an...@gmail.com> wrote:
   

 I don't have a terminal in front of me but are you sure tfidf-vectors is a
file, not a directory?

On Tuesday, March 3, 2015, Raghuveer <al...@yahoo.com.invalid> wrote:

> I have data file of the formatsrc_ip,dest_ip,packet, bytes_transferred,
> src_port,dest_port, start_timestamp
> 71.105.62.168, 38.106.70.147, 1, 54, 55704, 52747, 1341775056478
> 38.106.70.147, 71.105.62.168, 2, 1568, 52747, 55704, 1341775056478
> Firstly the text like src_ip should be converted to a number i think as
> per my reading how can i do this?I ran the following code successfully
> without any errors:./mahout seqdirectory --input
> /upload/20120708-0031-0060.csv --output /upload/output1
> ./mahout seq2sparse -i /upload/output1 -o /upload/output2
> but the output is emptydrwxr-xr-x  - admin supergroup    0
> /upload/output4/df-count
> -rw-r--r--  2 admin supergroup  2008  /upload/output4/dictionary.file-0
> -rw-r--r--  2 admin supergroup  1593  /upload/output4/frequency.file-0
> drwxr-xr-x  - admin supergroup    0  /upload/output4/tf-vectors
> drwxr-xr-x  - admin supergroup    0  /upload/output4/tfidf-vectors
> drwxr-xr-x  - admin supergroup    0  /upload/output4/tokenized-documents
> drwxr-xr-x  - admin supergroup    0  /upload/output4/wordcount
> i should have got something in tfidf-vectors isnt it? Can you kindly
> suggest what am missing.thanks in advance
>


   

Re: mahout output of seq2sparse is empty

Posted by Andrew Musselman <an...@gmail.com>.
I don't have a terminal in front of me but are you sure tfidf-vectors is a
file, not a directory?

On Tuesday, March 3, 2015, Raghuveer <al...@yahoo.com.invalid> wrote:

> I have data file of the formatsrc_ip,dest_ip,packet, bytes_transferred,
> src_port,dest_port, start_timestamp
> 71.105.62.168, 38.106.70.147, 1, 54, 55704, 52747, 1341775056478
> 38.106.70.147, 71.105.62.168, 2, 1568, 52747, 55704, 1341775056478
> Firstly the text like src_ip should be converted to a number i think as
> per my reading how can i do this?I ran the following code successfully
> without any errors:./mahout seqdirectory --input
> /upload/20120708-0031-0060.csv --output /upload/output1
> ./mahout seq2sparse -i /upload/output1 -o /upload/output2
> but the output is emptydrwxr-xr-x   - admin supergroup     0
> /upload/output4/df-count
> -rw-r--r--   2 admin supergroup  2008  /upload/output4/dictionary.file-0
> -rw-r--r--   2 admin supergroup  1593  /upload/output4/frequency.file-0
> drwxr-xr-x   - admin supergroup     0  /upload/output4/tf-vectors
> drwxr-xr-x   - admin supergroup     0  /upload/output4/tfidf-vectors
> drwxr-xr-x   - admin supergroup     0  /upload/output4/tokenized-documents
> drwxr-xr-x   - admin supergroup     0  /upload/output4/wordcount
> i should have got something in tfidf-vectors isnt it? Can you kindly
> suggest what am missing.thanks in advance
>