You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Margusja <ma...@roo.ee> on 2014/03/18 09:41:26 UTC

Command line vector to sequence file

Hi

I am looking a simple way in a command line how to convert vector to 
sequence file.
in example I have data.txt file contains vectors.
1,1
2,1
1,2
2,2
3,3
8,8
8,9
9,8
9,9

So is there command line possibility to convert that into sequence file?

I tried mahout seqdirectory but after it  hdfs dfs -text 
output2/part-m-00000 gives me something like:
/data.txt    1,1
2,1
1,2
2,2
3,3
8,8
8,9
9,8
9,9

and that is not sequence file format as I understand.

I know there are java API but I am looking command line.


-- 
Best regards, Margus (Margusja) Roo
+372 51 48 780
http://margus.roo.ee
http://ee.linkedin.com/in/margusroo
skype: margusja
ldapsearch -x -h ldap.sk.ee -b c=EE "(serialNumber=37303140314)"
-----BEGIN PUBLIC KEY-----
MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCvbeg7LwEC2SCpAEewwpC3ajxE
5ZsRMCB77L8bae9G7TslgLkoIzo9yOjPdx2NN6DllKbV65UjTay43uUDyql9g3tl
RhiJIcoAExkSTykWqAIPR88LfilLy1JlQ+0RD8OXiWOVVQfhOHpQ0R/jcAkM2lZa
BjM8j36yJvoBVsfOHQIDAQAB
-----END PUBLIC KEY-----


Re: Command line vector to sequence file

Posted by Kevin Moulart <ke...@gmail.com>.
You're welcome !

Here's the repository if need be :
https://github.com/kmoulart/hadoop_mahout_utils



Kévin Moulart


2014-03-18 10:00 GMT+01:00 Margusja <ma...@roo.ee>:

> Thank you, I am going to try it.
>
>
> Best regards, Margus (Margusja) Roo
> +372 51 48 780
> http://margus.roo.ee
> http://ee.linkedin.com/in/margusroo
> skype: margusja
> ldapsearch -x -h ldap.sk.ee -b c=EE "(serialNumber=37303140314)"
> -----BEGIN PUBLIC KEY-----
> MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCvbeg7LwEC2SCpAEewwpC3ajxE
> 5ZsRMCB77L8bae9G7TslgLkoIzo9yOjPdx2NN6DllKbV65UjTay43uUDyql9g3tl
> RhiJIcoAExkSTykWqAIPR88LfilLy1JlQ+0RD8OXiWOVVQfhOHpQ0R/jcAkM2lZa
> BjM8j36yJvoBVsfOHQIDAQAB
> -----END PUBLIC KEY-----
>
> On 18/03/14 10:58, Kevin Moulart wrote:
>
>> Hi,
>>
>> I did the same search a few weeks back and found that there is nothing in
>> the current API to do that from command line.
>>
>> However I did write a java program that transforms a csv into a
>> SequenceFile which can be used to train a naive bayes (amongst other
>> things).
>>
>> Here are the sources :
>> https://gist.github.com/kmoulart/9616125
>>
>> You'll find all you need to make a jar with dependecies running and with a
>> proper command line (using JCommander).
>> Both the sequential version and the MapReduce one are in the given files.
>>
>> If you're lazy, I'll put the whole maven project on my github later today.
>>
>> Hope it helps you
>>
>> Kévin Moulart
>>
>>
>> 2014-03-18 9:41 GMT+01:00 Margusja <ma...@roo.ee>:
>>
>>  Hi
>>>
>>> I am looking a simple way in a command line how to convert vector to
>>> sequence file.
>>> in example I have data.txt file contains vectors.
>>> 1,1
>>> 2,1
>>> 1,2
>>> 2,2
>>> 3,3
>>> 8,8
>>> 8,9
>>> 9,8
>>> 9,9
>>>
>>> So is there command line possibility to convert that into sequence file?
>>>
>>> I tried mahout seqdirectory but after it  hdfs dfs -text
>>> output2/part-m-00000 gives me something like:
>>> /data.txt    1,1
>>> 2,1
>>> 1,2
>>> 2,2
>>> 3,3
>>> 8,8
>>> 8,9
>>> 9,8
>>> 9,9
>>>
>>> and that is not sequence file format as I understand.
>>>
>>> I know there are java API but I am looking command line.
>>>
>>>
>>> --
>>> Best regards, Margus (Margusja) Roo
>>> +372 51 48 780
>>> http://margus.roo.ee
>>> http://ee.linkedin.com/in/margusroo
>>> skype: margusja
>>> ldapsearch -x -h ldap.sk.ee -b c=EE "(serialNumber=37303140314)"
>>> -----BEGIN PUBLIC KEY-----
>>> MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCvbeg7LwEC2SCpAEewwpC3ajxE
>>> 5ZsRMCB77L8bae9G7TslgLkoIzo9yOjPdx2NN6DllKbV65UjTay43uUDyql9g3tl
>>> RhiJIcoAExkSTykWqAIPR88LfilLy1JlQ+0RD8OXiWOVVQfhOHpQ0R/jcAkM2lZa
>>> BjM8j36yJvoBVsfOHQIDAQAB
>>> -----END PUBLIC KEY-----
>>>
>>>
>>>
>

Re: Command line vector to sequence file

Posted by Margusja <ma...@roo.ee>.
Thank you, I am going to try it.

Best regards, Margus (Margusja) Roo
+372 51 48 780
http://margus.roo.ee
http://ee.linkedin.com/in/margusroo
skype: margusja
ldapsearch -x -h ldap.sk.ee -b c=EE "(serialNumber=37303140314)"
-----BEGIN PUBLIC KEY-----
MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCvbeg7LwEC2SCpAEewwpC3ajxE
5ZsRMCB77L8bae9G7TslgLkoIzo9yOjPdx2NN6DllKbV65UjTay43uUDyql9g3tl
RhiJIcoAExkSTykWqAIPR88LfilLy1JlQ+0RD8OXiWOVVQfhOHpQ0R/jcAkM2lZa
BjM8j36yJvoBVsfOHQIDAQAB
-----END PUBLIC KEY-----

On 18/03/14 10:58, Kevin Moulart wrote:
> Hi,
>
> I did the same search a few weeks back and found that there is nothing in
> the current API to do that from command line.
>
> However I did write a java program that transforms a csv into a
> SequenceFile which can be used to train a naive bayes (amongst other
> things).
>
> Here are the sources :
> https://gist.github.com/kmoulart/9616125
>
> You'll find all you need to make a jar with dependecies running and with a
> proper command line (using JCommander).
> Both the sequential version and the MapReduce one are in the given files.
>
> If you're lazy, I'll put the whole maven project on my github later today.
>
> Hope it helps you
>
> Kévin Moulart
>
>
> 2014-03-18 9:41 GMT+01:00 Margusja <ma...@roo.ee>:
>
>> Hi
>>
>> I am looking a simple way in a command line how to convert vector to
>> sequence file.
>> in example I have data.txt file contains vectors.
>> 1,1
>> 2,1
>> 1,2
>> 2,2
>> 3,3
>> 8,8
>> 8,9
>> 9,8
>> 9,9
>>
>> So is there command line possibility to convert that into sequence file?
>>
>> I tried mahout seqdirectory but after it  hdfs dfs -text
>> output2/part-m-00000 gives me something like:
>> /data.txt    1,1
>> 2,1
>> 1,2
>> 2,2
>> 3,3
>> 8,8
>> 8,9
>> 9,8
>> 9,9
>>
>> and that is not sequence file format as I understand.
>>
>> I know there are java API but I am looking command line.
>>
>>
>> --
>> Best regards, Margus (Margusja) Roo
>> +372 51 48 780
>> http://margus.roo.ee
>> http://ee.linkedin.com/in/margusroo
>> skype: margusja
>> ldapsearch -x -h ldap.sk.ee -b c=EE "(serialNumber=37303140314)"
>> -----BEGIN PUBLIC KEY-----
>> MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCvbeg7LwEC2SCpAEewwpC3ajxE
>> 5ZsRMCB77L8bae9G7TslgLkoIzo9yOjPdx2NN6DllKbV65UjTay43uUDyql9g3tl
>> RhiJIcoAExkSTykWqAIPR88LfilLy1JlQ+0RD8OXiWOVVQfhOHpQ0R/jcAkM2lZa
>> BjM8j36yJvoBVsfOHQIDAQAB
>> -----END PUBLIC KEY-----
>>
>>


Re: Command line vector to sequence file

Posted by Kevin Moulart <ke...@gmail.com>.
Hi,

I did the same search a few weeks back and found that there is nothing in
the current API to do that from command line.

However I did write a java program that transforms a csv into a
SequenceFile which can be used to train a naive bayes (amongst other
things).

Here are the sources :
https://gist.github.com/kmoulart/9616125

You'll find all you need to make a jar with dependecies running and with a
proper command line (using JCommander).
Both the sequential version and the MapReduce one are in the given files.

If you're lazy, I'll put the whole maven project on my github later today.

Hope it helps you

Kévin Moulart


2014-03-18 9:41 GMT+01:00 Margusja <ma...@roo.ee>:

> Hi
>
> I am looking a simple way in a command line how to convert vector to
> sequence file.
> in example I have data.txt file contains vectors.
> 1,1
> 2,1
> 1,2
> 2,2
> 3,3
> 8,8
> 8,9
> 9,8
> 9,9
>
> So is there command line possibility to convert that into sequence file?
>
> I tried mahout seqdirectory but after it  hdfs dfs -text
> output2/part-m-00000 gives me something like:
> /data.txt    1,1
> 2,1
> 1,2
> 2,2
> 3,3
> 8,8
> 8,9
> 9,8
> 9,9
>
> and that is not sequence file format as I understand.
>
> I know there are java API but I am looking command line.
>
>
> --
> Best regards, Margus (Margusja) Roo
> +372 51 48 780
> http://margus.roo.ee
> http://ee.linkedin.com/in/margusroo
> skype: margusja
> ldapsearch -x -h ldap.sk.ee -b c=EE "(serialNumber=37303140314)"
> -----BEGIN PUBLIC KEY-----
> MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCvbeg7LwEC2SCpAEewwpC3ajxE
> 5ZsRMCB77L8bae9G7TslgLkoIzo9yOjPdx2NN6DllKbV65UjTay43uUDyql9g3tl
> RhiJIcoAExkSTykWqAIPR88LfilLy1JlQ+0RD8OXiWOVVQfhOHpQ0R/jcAkM2lZa
> BjM8j36yJvoBVsfOHQIDAQAB
> -----END PUBLIC KEY-----
>
>