You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Claudia Grieco <gr...@crmpa.unisa.it> on 2011/01/26 18:44:39 UTC

problems saving and loading SGD classifications

Hi guys,

I'm using the LogisticModelParameters.saveTo(file) and
LogisticModelParameters.loadFrom(file) methods to save and load an SGD
model.

But I'm unable to load the model because after calling loadFrom I get the
following error: 

Failed parsing JSON source: java.io.FileReader@12d03f9 to Json

[.]

Caused by: com.google.gson.ParseException: Encountered "\"beta\"" at line 6,
column 45.

 

Looking at the content of the model, I noticed that there are missing
colons, for examples:

"lr":{"mu0":1.0,"decayFactor":0.999,"stepOffset":

    10,"forgettingExponent":-0.5,"perTermAnnealingOffset":20,"step":189,

    "lambda":1.0,"sealed":true,"gradient":{}"beta"

 

There is a missing colon between gradient and beta.

But if I correct all the missing colons, I still get the following error:

Exception in thread "main" java.lang.RuntimeException: No-args constructor
for interface org.apache.mahout.math.Vector does not exist. Register an
InstanceCreator with Gson for this type to fix this problem.

                at
com.google.gson.MappedObjectConstructor.constructWithNoArgConstructor(Mapped
ObjectConstructor.java:64)

                at
com.google.gson.MappedObjectConstructor.construct(MappedObjectConstructor.ja
va:53)

                at
com.google.gson.JsonObjectDeserializationVisitor.constructTarget(JsonObjectD
eserializationVisitor.java:41)

                at
com.google.gson.JsonDeserializationVisitor.getTarget(JsonDeserializationVisi
tor.java:54)

                at
com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:98)

                at
com.google.gson.JsonDeserializationVisitor.visitChild(JsonDeserializationVis
itor.java:87)

                at
com.google.gson.JsonDeserializationVisitor.visitChildAsObject(JsonDeserializ
ationVisitor.java:75)

                at
com.google.gson.JsonObjectDeserializationVisitor.visitObjectField(JsonObject
DeserializationVisitor.java:62)

                at
com.google.gson.ObjectNavigator.navigateClassFields(ObjectNavigator.java:147
)

                at
com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:122)

                at
com.google.gson.JsonDeserializationVisitor.visitChild(JsonDeserializationVis
itor.java:87)

                at
com.google.gson.JsonDeserializationVisitor.visitChildAsObject(JsonDeserializ
ationVisitor.java:75)

                at
com.google.gson.JsonObjectDeserializationVisitor.visitObjectField(JsonObject
DeserializationVisitor.java:62)

                at
com.google.gson.ObjectNavigator.navigateClassFields(ObjectNavigator.java:147
)

                at
com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:122)

                at
com.google.gson.JsonDeserializationContextDefault.fromJsonObject(JsonDeseria
lizationContextDefault.java:73)

                at
com.google.gson.JsonDeserializationContextDefault.deserialize(JsonDeserializ
ationContextDefault.java:49)

                at com.google.gson.Gson.fromJson(Gson.java:379)

                at com.google.gson.Gson.fromJson(Gson.java:352)

                at
org.apache.mahout.classifier.sgd.LogisticModelParameters.loadFrom(LogisticMo
delParameters.java:141)

                at
org.apache.mahout.classifier.sgd.LogisticModelParameters.loadFrom(LogisticMo
delParameters.java:154)

 

What do you think can be the problem?

Thanks

Claudia

                


Re: problems saving and loading SGD classifications

Posted by Ted Dunning <te...@gmail.com>.
Here is the issue that describes the problem and fix:

https://issues.apache.org/jira/browse/MAHOUT-556

On Wed, Jan 26, 2011 at 10:38 AM, Ted Dunning <te...@gmail.com> wrote:

> This is a known problem that should be fixed in trunk.
>
> While you are at it, the LogisticModelParameters approach may not be as
> useful as the ModelSerializer approach.
>
> Here is a comparison of pros and cons:
>
> LogisticModelParameters
>
> + incorporates lots of CSV parsing info
> + serializes the whole lot including model and data representation
> + somewhat simpler to use
> + matches chapter 13 of MiA examples
> -- uses json to serialize model
> - pretty much assumes CSV input by implication
> - has a bug in many recent versions
>
> ModelSerializer
>
> ++ allows binary serialization
> + makes no assumptions about how feature vectors are encoded
> - requires that you make your own arrangements for vector encoding
>
> The bit about binary serialization is (for me) a real show-stopper for LMP
> for big models.  Almost as important is the issue about vector encoding
> since real Mahout applications tend to have large sparse text-like input
> variables.
>
>
>
> On Wed, Jan 26, 2011 at 9:44 AM, Claudia Grieco <gr...@crmpa.unisa.it>wrote:
>
>> What do you think can be the problem?
>>
>
>

Re: problems saving and loading SGD classifications

Posted by Ted Dunning <te...@gmail.com>.
Yes.  Trunk has that.

On Thu, Jan 27, 2011 at 6:59 AM, Ted Dunning <te...@gmail.com> wrote:

> TrainNewsGroups in the examples module does this.
>
> ModelSerializer supports json serialization as well as binary.  The JSON
> form breaks down for larger models because GSON does recursion instead of
> iteration to iterate through things on parsing.
>
> Trunk should have everything you need. (I will verify that in a moment)
>
>
> On Thu, Jan 27, 2011 at 12:38 AM, Claudia Grieco <gr...@crmpa.unisa.it>wrote:
>
>> Are there some examples of use of ModelSerializer?
>> Can I use it without fixing mahout from the trunk?
>> I see that ModelSerializer uses json too, isn't it?
>>
>> -----Messaggio originale-----
>> Da: Ted Dunning [mailto:ted.dunning@gmail.com]
>> Inviato: mercoledì 26 gennaio 2011 19.38
>> A: user@mahout.apache.org
>> Oggetto: Re: problems saving and loading SGD classifications
>>
>> This is a known problem that should be fixed in trunk.
>>
>> While you are at it, the LogisticModelParameters approach may not be as
>> useful as the ModelSerializer approach.
>>
>> Here is a comparison of pros and cons:
>>
>> LogisticModelParameters
>>
>> + incorporates lots of CSV parsing info
>> + serializes the whole lot including model and data representation
>> + somewhat simpler to use
>> + matches chapter 13 of MiA examples
>> -- uses json to serialize model
>> - pretty much assumes CSV input by implication
>> - has a bug in many recent versions
>>
>> ModelSerializer
>>
>> ++ allows binary serialization
>> + makes no assumptions about how feature vectors are encoded
>> - requires that you make your own arrangements for vector encoding
>>
>> The bit about binary serialization is (for me) a real show-stopper for LMP
>> for big models.  Almost as important is the issue about vector encoding
>> since real Mahout applications tend to have large sparse text-like input
>> variables.
>>
>>
>>
>> On Wed, Jan 26, 2011 at 9:44 AM, Claudia Grieco <grieco@crmpa.unisa.it
>> >wrote:
>>
>> > What do you think can be the problem?
>> >
>>
>>
>

Re: problems saving and loading SGD classifications

Posted by Ted Dunning <te...@gmail.com>.
TrainNewsGroups in the examples module does this.

ModelSerializer supports json serialization as well as binary.  The JSON
form breaks down for larger models because GSON does recursion instead of
iteration to iterate through things on parsing.

Trunk should have everything you need. (I will verify that in a moment)

On Thu, Jan 27, 2011 at 12:38 AM, Claudia Grieco <gr...@crmpa.unisa.it>wrote:

> Are there some examples of use of ModelSerializer?
> Can I use it without fixing mahout from the trunk?
> I see that ModelSerializer uses json too, isn't it?
>
> -----Messaggio originale-----
> Da: Ted Dunning [mailto:ted.dunning@gmail.com]
> Inviato: mercoledì 26 gennaio 2011 19.38
> A: user@mahout.apache.org
> Oggetto: Re: problems saving and loading SGD classifications
>
> This is a known problem that should be fixed in trunk.
>
> While you are at it, the LogisticModelParameters approach may not be as
> useful as the ModelSerializer approach.
>
> Here is a comparison of pros and cons:
>
> LogisticModelParameters
>
> + incorporates lots of CSV parsing info
> + serializes the whole lot including model and data representation
> + somewhat simpler to use
> + matches chapter 13 of MiA examples
> -- uses json to serialize model
> - pretty much assumes CSV input by implication
> - has a bug in many recent versions
>
> ModelSerializer
>
> ++ allows binary serialization
> + makes no assumptions about how feature vectors are encoded
> - requires that you make your own arrangements for vector encoding
>
> The bit about binary serialization is (for me) a real show-stopper for LMP
> for big models.  Almost as important is the issue about vector encoding
> since real Mahout applications tend to have large sparse text-like input
> variables.
>
>
>
> On Wed, Jan 26, 2011 at 9:44 AM, Claudia Grieco <grieco@crmpa.unisa.it
> >wrote:
>
> > What do you think can be the problem?
> >
>
>

R: problems saving and loading SGD classifications

Posted by Claudia Grieco <gr...@crmpa.unisa.it>.
Are there some examples of use of ModelSerializer?
Can I use it without fixing mahout from the trunk?
I see that ModelSerializer uses json too, isn't it?

-----Messaggio originale-----
Da: Ted Dunning [mailto:ted.dunning@gmail.com] 
Inviato: mercoledì 26 gennaio 2011 19.38
A: user@mahout.apache.org
Oggetto: Re: problems saving and loading SGD classifications

This is a known problem that should be fixed in trunk.

While you are at it, the LogisticModelParameters approach may not be as
useful as the ModelSerializer approach.

Here is a comparison of pros and cons:

LogisticModelParameters

+ incorporates lots of CSV parsing info
+ serializes the whole lot including model and data representation
+ somewhat simpler to use
+ matches chapter 13 of MiA examples
-- uses json to serialize model
- pretty much assumes CSV input by implication
- has a bug in many recent versions

ModelSerializer

++ allows binary serialization
+ makes no assumptions about how feature vectors are encoded
- requires that you make your own arrangements for vector encoding

The bit about binary serialization is (for me) a real show-stopper for LMP
for big models.  Almost as important is the issue about vector encoding
since real Mahout applications tend to have large sparse text-like input
variables.



On Wed, Jan 26, 2011 at 9:44 AM, Claudia Grieco <gr...@crmpa.unisa.it>wrote:

> What do you think can be the problem?
>


Re: problems saving and loading SGD classifications

Posted by Ted Dunning <te...@gmail.com>.
This is a known problem that should be fixed in trunk.

While you are at it, the LogisticModelParameters approach may not be as
useful as the ModelSerializer approach.

Here is a comparison of pros and cons:

LogisticModelParameters

+ incorporates lots of CSV parsing info
+ serializes the whole lot including model and data representation
+ somewhat simpler to use
+ matches chapter 13 of MiA examples
-- uses json to serialize model
- pretty much assumes CSV input by implication
- has a bug in many recent versions

ModelSerializer

++ allows binary serialization
+ makes no assumptions about how feature vectors are encoded
- requires that you make your own arrangements for vector encoding

The bit about binary serialization is (for me) a real show-stopper for LMP
for big models.  Almost as important is the issue about vector encoding
since real Mahout applications tend to have large sparse text-like input
variables.



On Wed, Jan 26, 2011 at 9:44 AM, Claudia Grieco <gr...@crmpa.unisa.it>wrote:

> What do you think can be the problem?
>