You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucenenet.apache.org by "Neal Granroth (JIRA)" <ji...@apache.org> on 2011/01/07 16:34:46 UTC

[jira] Commented: (LUCENENET-380) Evaluate Sharpen as a port tool

    [ https://issues.apache.org/jira/browse/LUCENENET-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978816#action_12978816 ] 

Neal Granroth commented on LUCENENET-380:
-----------------------------------------

Thanks Alex,

What would be the plan for handling the Sharpen artifacts that prevent the converted code from being built by the .NET SDK compiler?

Do you envision a post-conversion script to strip out statements like:
using Java.Lang
using Java.IO

and replace Sharpen-specific classes with standard .NET classes:
Sharpen.Collections.*
Sharpen.Runtime.*



> Evaluate Sharpen as a port tool
> -------------------------------
>
>                 Key: LUCENENET-380
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-380
>             Project: Lucene.Net
>          Issue Type: Task
>            Reporter: George Aroush
>         Attachments: 3.0.2_JavaToCSharpConverter_AfterPostProcessing.zip, 3.0.2_JavaToCSharpConverter_NoPostProcessing.zip, IndexWriter.java, Lucene.Net.3_0_3_Sharpen20110106.zip, Lucene.Net.Sharpen20101104.zip, Lucene.Net.Sharpen20101114.zip, NIOFSDirectory.java, QueryParser.java, TestBufferedIndexInput.java, TestDateFilter.java
>
>
> This task is to evaluate Sharpen as a port tool for Lucene.Net.
> The files to be evaluated are attached.  We need to run those files (which are off Java Lucene 2.9.2) against Sharpen and compare the result against JLCA result.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

RE: [jira] Commented: (LUCENENET-380) Evaluate Sharpen as a port tool

Posted by Digy <di...@gmail.com>.

Hi Peter,

Most of them existed also in previous ports. This is why we have a huge
SupportClass.cs
If you compare the source with Lucene.Net 2.9.2, you can see how it was
solved previously.

DIGY

-----Original Message-----
From: Peter Mateja [mailto:peter.mateja@gmail.com] 
Sent: Friday, January 07, 2011 7:53 PM
To: lucene-net-dev@lucene.apache.org
Subject: Re: [jira] Commented: (LUCENENET-380) Evaluate Sharpen as a port
tool

Nice work Alex!

Not that this represents a solution, but I did load up the core source from
your conversion into a VS2010 project, then ran Resharper's code cleanup on
it.

This process took care of all the unused 'using Java.*' references, cleanup
up formatting, etc.  However, I'm still seeing a good many things that need
work:

1) ICloseable -> IDisposable, including refactoring of the implementation
from Close() to Dispose() (and also considering any additional refactoring
of the Disposable pattern.)
2) IFieldCache is marked as an interface, but has tons of static fields,
subclasses and interfaces.  This may be ok in Java, but not in C#.  Not sure
what the best course of action here might be... perhaps create an abstract
base class called FieldCache or FieldCacheBase to house this stuff, and pull
out the nested classes / interfaces into their own files.
3) Use of a generic WeakReference<>, which doesn't exist in generic form in
the .Net Framework.  This is something which could either be refactored or
implemented as generic.
4) ICloneable interface not implemented (see IndexInput.cs)
5) Unsigned bitwise shift assignment operator doesn't exist in C#.  See
IndexOutput.cs, WriteVInt() method.  The line i >>>= 7; in java flags an
error in C#.  I'm not entirely sure in this case, but I believe this can
safely be converted to: i >>= 7; in this case, especially given the comment
that negative numbers are not supported.
6) Use of Java DecimalFormat class.  An appropriate .Net replacement should
be easily substituted with some refactoring of the code.
7) Use of Runtime.IdentityHashCode().  Not sure how necessary this is.
8) Java specific value type parsing calls should be refactored to .Net (e.g.
double.ParseDouble() => double.Parse())
9) Use of the java ReadResolve() object serialization pattern needs to be
analyzed / refactored (see FieldCache.DefaultByteParser (or in the
translated version, IFieldCache._IByteParser)).
10) Use of Sharpen references.
11) Use of Java's NumberFormatException... should be refactored to use an
appropriate standard exception type (perhaps FormatException, though I'm not
sure this is appropriate) or create an internal Exception class for this
case.

There's plenty more build issues... I need to put this down for the rest of
the day, so I thought I'd at least get this out to the list.

Peter Mateja
peter.mateja@gmail.com

On Fri, Jan 7, 2011 at 9:34 AM, Neal Granroth (JIRA) <ji...@apache.org>wrote:

>
>    [
>
https://issues.apache.org/jira/browse/LUCENENET-380?page=com.atlassian.jira.
plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978816#acti
on_12978816]
>
> Neal Granroth commented on LUCENENET-380:
> -----------------------------------------
>
> Thanks Alex,
>
> What would be the plan for handling the Sharpen artifacts that prevent the
> converted code from being built by the .NET SDK compiler?
>
> Do you envision a post-conversion script to strip out statements like:
> using Java.Lang
> using Java.IO
>
> and replace Sharpen-specific classes with standard .NET classes:
> Sharpen.Collections.*
> Sharpen.Runtime.*
>
>
>
> > Evaluate Sharpen as a port tool
> > -------------------------------
> >
> >                 Key: LUCENENET-380
> >                 URL: https://issues.apache.org/jira/browse/LUCENENET-380
> >             Project: Lucene.Net
> >          Issue Type: Task
> >            Reporter: George Aroush
> >         Attachments:
3.0.2_JavaToCSharpConverter_AfterPostProcessing.zip,
> 3.0.2_JavaToCSharpConverter_NoPostProcessing.zip, IndexWriter.java,
> Lucene.Net.3_0_3_Sharpen20110106.zip, Lucene.Net.Sharpen20101104.zip,
> Lucene.Net.Sharpen20101114.zip, NIOFSDirectory.java, QueryParser.java,
> TestBufferedIndexInput.java, TestDateFilter.java
> >
> >
> > This task is to evaluate Sharpen as a port tool for Lucene.Net.
> > The files to be evaluated are attached.  We need to run those files
> (which are off Java Lucene 2.9.2) against Sharpen and compare the result
> against JLCA result.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

RE: [jira] Commented: (LUCENENET-380) Evaluate Sharpen as a port tool

Posted by Alex Thompson <pi...@hotmail.com>.

I agree some things will fall outside of the pure syntax mapping of a
conversion but I think its possible to have a more integrated pre/post than
the patch workflow laid out below. 

First a little background on the internals of sharpen, which uses an
abstract syntax tree(AST) approach. Overall it gets a java AST (by using
Eclipse) then converts that to a C# AST, then renders that to C# text. So
the pre/post could be done when the code is in AST form which should allow
things to be more programmatic and generalized. The specific pre/post item
settings could be stored in the sharpen config file(which already has
options to adjust mappings) which would be maintained version to version. So
overall the process would be automated into one runnable step but there
would be scope specific settings as needed.

Alex

-----Original Message-----
From: Peter Mateja [mailto:peter.mateja@gmail.com] 
Sent: Monday, January 10, 2011 2:43 PM
To: lucene-net-dev@lucene.apache.org
Subject: Re: [jira] Commented: (LUCENENET-380) Evaluate Sharpen as a port
tool

The amount of custom work required for the conversion is starting to concern
me a bit.  Well, to clarify, the work itself doesn't concern me, but rather
I'm worried that this is going to make a purely automated conversion process
very difficult to pull off and probably very fragile.  The devil is
definitely in the details.

What are thoughts concerning how we can begin to tackle this?

How many of these issues can be handled by Sharpen, or a modified, custom
version of Sharpen?  What items are best handled by a pre/post processor?

A number of the items DIGY listed (thanks!) seem to fall under the scope of
"code intent", vs pure syntactical mapping.  I'd suggest that it's
unrealistic to expect any conversion tool to manage those types of issues.

Perhaps a process such as the following should be our initial draft:

1) Start with Lucene.Java source, initially the latest 3.0.3 release.
2) Make specific hand coded changes to the java source code to assist with
certain automated conversion issues.  These changes should be expressed as a
set of patch files, to be automatically applied to the java source on
subsequent iterations of this process.  Any patch rejections should break
the build.  These patches should be maintained as new code updates come from
the java source.
3) Run an automated conversion tool (Sharpen most likely.)
4) Perform any desired post processing to modify the source code structure,
setup project / solution files, etc.  Essentially, get the project into a
state that it's loadable by Visual Studio.  At this point there will be
errors (lots of them.)  The output of this step should be checked in as the
raw conversion source.
5) Make changes to the converted C# code, including necessary helper
classes, in order to fix all the remaining issues alluded to by DIGY.  Also,
run any automated post processing, such as Resharper code formatting (the
formatting settings should be standardized across the project to ensure
normalized and repeatable refactorings), inline docs tweaks, etc.  These
changes should also be expressed as a set of patch files, to be
automatically applied to the raw conversion source on subsequent iterations
of this process.  Any patch rejections should break the build.  These
patches should represent the bulk of the efforts of the Lucene.Net core dev
team.  The output of this step should be checked in as the official
Lucene.Net source code.

This entire process needs to be checked into a conversion process branch.
 After the initial build of this system, workflow would be split into the
following 2 vectors:
A) On java source changes (probably at a courser level than individual
commits,) steps 1-4 would be run to build a new base raw conversion source.
 With the java changes, it's possible that changes to the patch files in
step 2 would be required.  Then step 5 would be run to create the official
Lucene.Net source.  Again, fixes to the patches may be in order depending on
the complexity of the original java changes
B) Most other changes would be considered C#-side specific.  This might
involve platform specific bug fixes, desired code refactorings, etc.  These
changes would be made based on the current checked in Lucene.Net source, and
the patch files for step 5 would be updated to reflect those changes.

Conversion process changes would fall outside the scope of standard
development, being fairly disruptive.

Of course, this process does complicate the development / maintenance
process quite a bit, by making many more vectors of change.  And, I'm aware
that what I've blathered on about here has probably already been discussed,
but I wanted to get some discussion going.  Thoughts?

Peter Mateja
peter.mateja@gmail.com

On Sun, Jan 9, 2011 at 4:09 PM, Digy <di...@gmail.com> wrote:

> Having a "buildable" & "clean" code is just a beginning and should not 
> result in lost of know-hows.
> Before trying to fix the bugs of the output of these tools, everyone 
> should see how they were fixed in Lucene.Net 2.9.2.
> There is no need to reinvent the wheel.
>
> Here is a quick list of tips & tricks as far as I can remember.
>
> * Decimal separator is not always ".", some locales use "," (while 
> parsing float/double).
> * "Set" in Java accepts "null" as argument.  A null-control is needed 
> while porting.
> * ReadResolve should be ported by implementing the interface 
> "System.Runtime.Serialization.IObjectReference"
>        public Object
> GetRealObject(System.Runtime.Serialization.StreamingContext context)
>        {
>            return ReadResolve();
>        }
> * .NET emits "\ufffd" as invalid char but java as "\x00"
> * Use StringComparer.Ordinal while comparing strings.
> * FIPS compliance.  use SHA1 instead of MD5
> * Use "System.Runtime.Serialization.OnDeserialized" attribute on 
> Serializable classes.
>        void 
> OnDeserialized(System.Runtime.Serialization.StreamingContext
> context)
>        {
>            -----
>        }
> * Use "System.IO.Path.DirectorySeparatorChar" or "Path.Combine" 
> instead of using "\\". (causes problems on Mono)
> * Iteration problems.  "if (i.MoveNext()){...}" can not be used (in a 
> while
> loop)  to detect the end of the list.
> * Port of TreeSet. TreeSet in Java sorts its contents based on the 
> default Comparator of the items, but the ArrayList does not.
> * Unexpected results when writing custom analyzers. Override 
> Read,ReadBlock,ReadLine,Peek,ReadToEnd in ReusableStringReader.
> * Multi-dimensional arrays: "length" in java returns the number of 
> dimensions. In c# "Length" returns the total number of elements in all 
> dimensions.
> * Copy private fields in the class' "Clone" method.
> * Don't forget: base-36-encoding is used in filenames.
> * Use "if (dataLen <=0 )" instead of  "if (dataLen == -1)" to detect 
> end of stream.
> * Case insensivity. Don't use public names such as "text" and "Text" 
> in a single class (problem for VB users).
> * Use ThreadClass in SupportClass.cs instead of 
> System.Threading.Thread
> * Use "System.Text.Encoding.UTF8" instead of "System.Text.Encoding.ASCII"
> * ">>>" is already implemented in SupportClass.
> * Threshold differences between .NET & Java while comparing
floats/doubles.
> ----Use also these classes:
> * There is a good implementation of WeakHashTable in SupportClass. 
> (needs
> "Generics")
> * There is a very fast LRU cache impl. (SimpleLRUCache). (needs 
> "Generics")
>
>
> PS: This not a complete list and there may be many others from other 
> contributers of Lucene.Net
>
> DIGY
>
>
>
>
> -----Original Message-----
> From: Peter Mateja [mailto:peter.mateja@gmail.com]
> Sent: Friday, January 07, 2011 7:53 PM
> To: lucene-net-dev@lucene.apache.org
> Subject: Re: [jira] Commented: (LUCENENET-380) Evaluate Sharpen as a 
> port tool
>
> Nice work Alex!
>
> Not that this represents a solution, but I did load up the core source 
> from your conversion into a VS2010 project, then ran Resharper's code 
> cleanup on it.
>
> This process took care of all the unused 'using Java.*' references, 
> cleanup up formatting, etc.  However, I'm still seeing a good many 
> things that need
> work:
>
> 1) ICloseable -> IDisposable, including refactoring of the 
> implementation from Close() to Dispose() (and also considering any 
> additional refactoring of the Disposable pattern.)
> 2) IFieldCache is marked as an interface, but has tons of static 
> fields, subclasses and interfaces.  This may be ok in Java, but not in 
> C#.  Not sure what the best course of action here might be... perhaps 
> create an abstract base class called FieldCache or FieldCacheBase to 
> house this stuff, and pull out the nested classes / interfaces into 
> their own files.
> 3) Use of a generic WeakReference<>, which doesn't exist in generic 
> form in the .Net Framework.  This is something which could either be 
> refactored or implemented as generic.
> 4) ICloneable interface not implemented (see IndexInput.cs)
> 5) Unsigned bitwise shift assignment operator doesn't exist in C#.  
> See IndexOutput.cs, WriteVInt() method.  The line i >>>= 7; in java 
> flags an error in C#.  I'm not entirely sure in this case, but I 
> believe this can safely be converted to: i >>= 7; in this case, 
> especially given the comment that negative numbers are not supported.
> 6) Use of Java DecimalFormat class.  An appropriate .Net replacement 
> should be easily substituted with some refactoring of the code.
> 7) Use of Runtime.IdentityHashCode().  Not sure how necessary this is.
> 8) Java specific value type parsing calls should be refactored to .Net 
> (e.g.
> double.ParseDouble() => double.Parse())
> 9) Use of the java ReadResolve() object serialization pattern needs to 
> be analyzed / refactored (see FieldCache.DefaultByteParser (or in the 
> translated version, IFieldCache._IByteParser)).
> 10) Use of Sharpen references.
> 11) Use of Java's NumberFormatException... should be refactored to use 
> an appropriate standard exception type (perhaps FormatException, 
> though I'm not sure this is appropriate) or create an internal 
> Exception class for this case.
>
> There's plenty more build issues... I need to put this down for the 
> rest of the day, so I thought I'd at least get this out to the list.
>
> Peter Mateja
> peter.mateja@gmail.com
>
>
>
> On Fri, Jan 7, 2011 at 9:34 AM, Neal Granroth (JIRA) <jira@apache.org
> >wrote:
>
> >
> >    [
> >
> https://issues.apache.org/jira/browse/LUCENENET-380?page=com.atlassian
> .jira
> .
>
> plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1297881
> 6#acti
> on_12978816]
> >
> > Neal Granroth commented on LUCENENET-380:
> > -----------------------------------------
> >
> > Thanks Alex,
> >
> > What would be the plan for handling the Sharpen artifacts that 
> > prevent
> the
> > converted code from being built by the .NET SDK compiler?
> >
> > Do you envision a post-conversion script to strip out statements like:
> > using Java.Lang
> > using Java.IO
> >
> > and replace Sharpen-specific classes with standard .NET classes:
> > Sharpen.Collections.*
> > Sharpen.Runtime.*
> >
> >
> >
> > > Evaluate Sharpen as a port tool
> > > -------------------------------
> > >
> > >                 Key: LUCENENET-380
> > >                 URL:
> https://issues.apache.org/jira/browse/LUCENENET-380
> > >             Project: Lucene.Net
> > >          Issue Type: Task
> > >            Reporter: George Aroush
> > >         Attachments:
> 3.0.2_JavaToCSharpConverter_AfterPostProcessing.zip,
> > 3.0.2_JavaToCSharpConverter_NoPostProcessing.zip, IndexWriter.java, 
> > Lucene.Net.3_0_3_Sharpen20110106.zip, 
> > Lucene.Net.Sharpen20101104.zip, Lucene.Net.Sharpen20101114.zip, 
> > NIOFSDirectory.java, QueryParser.java, TestBufferedIndexInput.java, 
> > TestDateFilter.java
> > >
> > >
> > > This task is to evaluate Sharpen as a port tool for Lucene.Net.
> > > The files to be evaluated are attached.  We need to run those 
> > > files
> > (which are off Java Lucene 2.9.2) against Sharpen and compare the 
> > result against JLCA result.
> >
> > --
> > This message is automatically generated by JIRA.
> > -
> > You can reply to this email to add a comment to the issue online.
> >
> >
>
>

Re: [jira] Commented: (LUCENENET-380) Evaluate Sharpen as a port tool

Posted by Peter Mateja <pe...@gmail.com>.

The amount of custom work required for the conversion is starting to concern
me a bit.  Well, to clarify, the work itself doesn't concern me, but rather
I'm worried that this is going to make a purely automated conversion process
very difficult to pull off and probably very fragile.  The devil is
definitely in the details.

What are thoughts concerning how we can begin to tackle this?

How many of these issues can be handled by Sharpen, or a modified, custom
version of Sharpen?  What items are best handled by a pre/post processor?

A number of the items DIGY listed (thanks!) seem to fall under the scope of
"code intent", vs pure syntactical mapping.  I'd suggest that it's
unrealistic to expect any conversion tool to manage those types of issues.

Perhaps a process such as the following should be our initial draft:

1) Start with Lucene.Java source, initially the latest 3.0.3 release.
2) Make specific hand coded changes to the java source code to assist with
certain automated conversion issues.  These changes should be expressed as a
set of patch files, to be automatically applied to the java source on
subsequent iterations of this process.  Any patch rejections should break
the build.  These patches should be maintained as new code updates come from
the java source.
3) Run an automated conversion tool (Sharpen most likely.)
4) Perform any desired post processing to modify the source code structure,
setup project / solution files, etc.  Essentially, get the project into a
state that it's loadable by Visual Studio.  At this point there will be
errors (lots of them.)  The output of this step should be checked in as the
raw conversion source.
5) Make changes to the converted C# code, including necessary helper
classes, in order to fix all the remaining issues alluded to by DIGY.  Also,
run any automated post processing, such as Resharper code formatting (the
formatting settings should be standardized across the project to ensure
normalized and repeatable refactorings), inline docs tweaks, etc.  These
changes should also be expressed as a set of patch files, to be
automatically applied to the raw conversion source on subsequent iterations
of this process.  Any patch rejections should break the build.  These
patches should represent the bulk of the efforts of the Lucene.Net core dev
team.  The output of this step should be checked in as the official
Lucene.Net source code.

This entire process needs to be checked into a conversion process branch.
 After the initial build of this system, workflow would be split into the
following 2 vectors:
A) On java source changes (probably at a courser level than individual
commits,) steps 1-4 would be run to build a new base raw conversion source.
 With the java changes, it's possible that changes to the patch files in
step 2 would be required.  Then step 5 would be run to create the official
Lucene.Net source.  Again, fixes to the patches may be in order depending on
the complexity of the original java changes
B) Most other changes would be considered C#-side specific.  This might
involve platform specific bug fixes, desired code refactorings, etc.  These
changes would be made based on the current checked in Lucene.Net source, and
the patch files for step 5 would be updated to reflect those changes.

Conversion process changes would fall outside the scope of standard
development, being fairly disruptive.

Of course, this process does complicate the development / maintenance
process quite a bit, by making many more vectors of change.  And, I'm aware
that what I've blathered on about here has probably already been discussed,
but I wanted to get some discussion going.  Thoughts?

Peter Mateja
peter.mateja@gmail.com

On Sun, Jan 9, 2011 at 4:09 PM, Digy <di...@gmail.com> wrote:

> Having a "buildable" & "clean" code is just a beginning and should not
> result in lost of know-hows.
> Before trying to fix the bugs of the output of these tools, everyone should
> see how they were fixed in Lucene.Net 2.9.2.
> There is no need to reinvent the wheel.
>
> Here is a quick list of tips & tricks as far as I can remember.
>
> * Decimal separator is not always ".", some locales use "," (while parsing
> float/double).
> * "Set" in Java accepts "null" as argument.  A null-control is needed while
> porting.
> * ReadResolve should be ported by implementing the interface
> "System.Runtime.Serialization.IObjectReference"
>        public Object
> GetRealObject(System.Runtime.Serialization.StreamingContext context)
>        {
>            return ReadResolve();
>        }
> * .NET emits "\ufffd" as invalid char but java as "\x00"
> * Use StringComparer.Ordinal while comparing strings.
> * FIPS compliance.  use SHA1 instead of MD5
> * Use "System.Runtime.Serialization.OnDeserialized" attribute on
> Serializable classes.
>        void OnDeserialized(System.Runtime.Serialization.StreamingContext
> context)
>        {
>            -----
>        }
> * Use "System.IO.Path.DirectorySeparatorChar" or "Path.Combine" instead of
> using "\\". (causes problems on Mono)
> * Iteration problems.  "if (i.MoveNext()){...}" can not be used (in a while
> loop)  to detect the end of the list.
> * Port of TreeSet. TreeSet in Java sorts its contents based on the default
> Comparator of the items, but the ArrayList does not.
> * Unexpected results when writing custom analyzers. Override
> Read,ReadBlock,ReadLine,Peek,ReadToEnd in ReusableStringReader.
> * Multi-dimensional arrays: "length" in java returns the number of
> dimensions. In c# "Length" returns the total number of elements in all
> dimensions.
> * Copy private fields in the class' "Clone" method.
> * Don't forget: base-36-encoding is used in filenames.
> * Use "if (dataLen <=0 )" instead of  "if (dataLen == -1)" to detect end of
> stream.
> * Case insensivity. Don't use public names such as "text" and "Text" in a
> single class (problem for VB users).
> * Use ThreadClass in SupportClass.cs instead of System.Threading.Thread
> * Use "System.Text.Encoding.UTF8" instead of "System.Text.Encoding.ASCII"
> * ">>>" is already implemented in SupportClass.
> * Threshold differences between .NET & Java while comparing floats/doubles.
> ----Use also these classes:
> * There is a good implementation of WeakHashTable in SupportClass. (needs
> "Generics")
> * There is a very fast LRU cache impl. (SimpleLRUCache). (needs "Generics")
>
>
> PS: This not a complete list and there may be many others from other
> contributers of Lucene.Net
>
> DIGY
>
>
>
>
> -----Original Message-----
> From: Peter Mateja [mailto:peter.mateja@gmail.com]
> Sent: Friday, January 07, 2011 7:53 PM
> To: lucene-net-dev@lucene.apache.org
> Subject: Re: [jira] Commented: (LUCENENET-380) Evaluate Sharpen as a port
> tool
>
> Nice work Alex!
>
> Not that this represents a solution, but I did load up the core source from
> your conversion into a VS2010 project, then ran Resharper's code cleanup on
> it.
>
> This process took care of all the unused 'using Java.*' references, cleanup
> up formatting, etc.  However, I'm still seeing a good many things that need
> work:
>
> 1) ICloseable -> IDisposable, including refactoring of the implementation
> from Close() to Dispose() (and also considering any additional refactoring
> of the Disposable pattern.)
> 2) IFieldCache is marked as an interface, but has tons of static fields,
> subclasses and interfaces.  This may be ok in Java, but not in C#.  Not
> sure
> what the best course of action here might be... perhaps create an abstract
> base class called FieldCache or FieldCacheBase to house this stuff, and
> pull
> out the nested classes / interfaces into their own files.
> 3) Use of a generic WeakReference<>, which doesn't exist in generic form in
> the .Net Framework.  This is something which could either be refactored or
> implemented as generic.
> 4) ICloneable interface not implemented (see IndexInput.cs)
> 5) Unsigned bitwise shift assignment operator doesn't exist in C#.  See
> IndexOutput.cs, WriteVInt() method.  The line i >>>= 7; in java flags an
> error in C#.  I'm not entirely sure in this case, but I believe this can
> safely be converted to: i >>= 7; in this case, especially given the comment
> that negative numbers are not supported.
> 6) Use of Java DecimalFormat class.  An appropriate .Net replacement should
> be easily substituted with some refactoring of the code.
> 7) Use of Runtime.IdentityHashCode().  Not sure how necessary this is.
> 8) Java specific value type parsing calls should be refactored to .Net
> (e.g.
> double.ParseDouble() => double.Parse())
> 9) Use of the java ReadResolve() object serialization pattern needs to be
> analyzed / refactored (see FieldCache.DefaultByteParser (or in the
> translated version, IFieldCache._IByteParser)).
> 10) Use of Sharpen references.
> 11) Use of Java's NumberFormatException... should be refactored to use an
> appropriate standard exception type (perhaps FormatException, though I'm
> not
> sure this is appropriate) or create an internal Exception class for this
> case.
>
> There's plenty more build issues... I need to put this down for the rest of
> the day, so I thought I'd at least get this out to the list.
>
> Peter Mateja
> peter.mateja@gmail.com
>
>
>
> On Fri, Jan 7, 2011 at 9:34 AM, Neal Granroth (JIRA) <jira@apache.org
> >wrote:
>
> >
> >    [
> >
> https://issues.apache.org/jira/browse/LUCENENET-380?page=com.atlassian.jira
> .
>
> plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978816#acti
> on_12978816]
> >
> > Neal Granroth commented on LUCENENET-380:
> > -----------------------------------------
> >
> > Thanks Alex,
> >
> > What would be the plan for handling the Sharpen artifacts that prevent
> the
> > converted code from being built by the .NET SDK compiler?
> >
> > Do you envision a post-conversion script to strip out statements like:
> > using Java.Lang
> > using Java.IO
> >
> > and replace Sharpen-specific classes with standard .NET classes:
> > Sharpen.Collections.*
> > Sharpen.Runtime.*
> >
> >
> >
> > > Evaluate Sharpen as a port tool
> > > -------------------------------
> > >
> > >                 Key: LUCENENET-380
> > >                 URL:
> https://issues.apache.org/jira/browse/LUCENENET-380
> > >             Project: Lucene.Net
> > >          Issue Type: Task
> > >            Reporter: George Aroush
> > >         Attachments:
> 3.0.2_JavaToCSharpConverter_AfterPostProcessing.zip,
> > 3.0.2_JavaToCSharpConverter_NoPostProcessing.zip, IndexWriter.java,
> > Lucene.Net.3_0_3_Sharpen20110106.zip, Lucene.Net.Sharpen20101104.zip,
> > Lucene.Net.Sharpen20101114.zip, NIOFSDirectory.java, QueryParser.java,
> > TestBufferedIndexInput.java, TestDateFilter.java
> > >
> > >
> > > This task is to evaluate Sharpen as a port tool for Lucene.Net.
> > > The files to be evaluated are attached.  We need to run those files
> > (which are off Java Lucene 2.9.2) against Sharpen and compare the result
> > against JLCA result.
> >
> > --
> > This message is automatically generated by JIRA.
> > -
> > You can reply to this email to add a comment to the issue online.
> >
> >
>
>

RE: [jira] Commented: (LUCENENET-380) Evaluate Sharpen as a port tool

Posted by Digy <di...@gmail.com>.

Having a "buildable" & "clean" code is just a beginning and should not
result in lost of know-hows.
Before trying to fix the bugs of the output of these tools, everyone should
see how they were fixed in Lucene.Net 2.9.2.
There is no need to reinvent the wheel.

Here is a quick list of tips & tricks as far as I can remember. 

* Decimal separator is not always ".", some locales use "," (while parsing
float/double).
* "Set" in Java accepts "null" as argument.  A null-control is needed while
porting.
* ReadResolve should be ported by implementing the interface
"System.Runtime.Serialization.IObjectReference"
	public Object
GetRealObject(System.Runtime.Serialization.StreamingContext context)
        {
            return ReadResolve();
        }
* .NET emits "\ufffd" as invalid char but java as "\x00"
* Use StringComparer.Ordinal while comparing strings.
* FIPS compliance.  use SHA1 instead of MD5
* Use "System.Runtime.Serialization.OnDeserialized" attribute on
Serializable classes.
	void OnDeserialized(System.Runtime.Serialization.StreamingContext
context)
        {
            -----
        }
* Use "System.IO.Path.DirectorySeparatorChar" or "Path.Combine" instead of
using "\\". (causes problems on Mono)
* Iteration problems.  "if (i.MoveNext()){...}" can not be used (in a while
loop)  to detect the end of the list.
* Port of TreeSet. TreeSet in Java sorts its contents based on the default
Comparator of the items, but the ArrayList does not.
* Unexpected results when writing custom analyzers. Override
Read,ReadBlock,ReadLine,Peek,ReadToEnd in ReusableStringReader.
* Multi-dimensional arrays: "length" in java returns the number of
dimensions. In c# "Length" returns the total number of elements in all
dimensions.
* Copy private fields in the class' "Clone" method.
* Don't forget: base-36-encoding is used in filenames.
* Use "if (dataLen <=0 )" instead of  "if (dataLen == -1)" to detect end of
stream.
* Case insensivity. Don't use public names such as "text" and "Text" in a
single class (problem for VB users).
* Use ThreadClass in SupportClass.cs instead of System.Threading.Thread
* Use "System.Text.Encoding.UTF8" instead of "System.Text.Encoding.ASCII"
* ">>>" is already implemented in SupportClass.
* Threshold differences between .NET & Java while comparing floats/doubles.
----Use also these classes: 
* There is a good implementation of WeakHashTable in SupportClass. (needs
"Generics")
* There is a very fast LRU cache impl. (SimpleLRUCache). (needs "Generics")

PS: This not a complete list and there may be many others from other
contributers of Lucene.Net

DIGY

-----Original Message-----
From: Peter Mateja [mailto:peter.mateja@gmail.com] 
Sent: Friday, January 07, 2011 7:53 PM
To: lucene-net-dev@lucene.apache.org
Subject: Re: [jira] Commented: (LUCENENET-380) Evaluate Sharpen as a port
tool

Nice work Alex!

Not that this represents a solution, but I did load up the core source from
your conversion into a VS2010 project, then ran Resharper's code cleanup on
it.

This process took care of all the unused 'using Java.*' references, cleanup
up formatting, etc.  However, I'm still seeing a good many things that need
work:

1) ICloseable -> IDisposable, including refactoring of the implementation
from Close() to Dispose() (and also considering any additional refactoring
of the Disposable pattern.)
2) IFieldCache is marked as an interface, but has tons of static fields,
subclasses and interfaces.  This may be ok in Java, but not in C#.  Not sure
what the best course of action here might be... perhaps create an abstract
base class called FieldCache or FieldCacheBase to house this stuff, and pull
out the nested classes / interfaces into their own files.
3) Use of a generic WeakReference<>, which doesn't exist in generic form in
the .Net Framework.  This is something which could either be refactored or
implemented as generic.
4) ICloneable interface not implemented (see IndexInput.cs)
5) Unsigned bitwise shift assignment operator doesn't exist in C#.  See
IndexOutput.cs, WriteVInt() method.  The line i >>>= 7; in java flags an
error in C#.  I'm not entirely sure in this case, but I believe this can
safely be converted to: i >>= 7; in this case, especially given the comment
that negative numbers are not supported.
6) Use of Java DecimalFormat class.  An appropriate .Net replacement should
be easily substituted with some refactoring of the code.
7) Use of Runtime.IdentityHashCode().  Not sure how necessary this is.
8) Java specific value type parsing calls should be refactored to .Net (e.g.
double.ParseDouble() => double.Parse())
9) Use of the java ReadResolve() object serialization pattern needs to be
analyzed / refactored (see FieldCache.DefaultByteParser (or in the
translated version, IFieldCache._IByteParser)).
10) Use of Sharpen references.
11) Use of Java's NumberFormatException... should be refactored to use an
appropriate standard exception type (perhaps FormatException, though I'm not
sure this is appropriate) or create an internal Exception class for this
case.

There's plenty more build issues... I need to put this down for the rest of
the day, so I thought I'd at least get this out to the list.

Peter Mateja
peter.mateja@gmail.com

On Fri, Jan 7, 2011 at 9:34 AM, Neal Granroth (JIRA) <ji...@apache.org>wrote:

>
>    [
>
https://issues.apache.org/jira/browse/LUCENENET-380?page=com.atlassian.jira.
plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978816#acti
on_12978816]
>
> Neal Granroth commented on LUCENENET-380:
> -----------------------------------------
>
> Thanks Alex,
>
> What would be the plan for handling the Sharpen artifacts that prevent the
> converted code from being built by the .NET SDK compiler?
>
> Do you envision a post-conversion script to strip out statements like:
> using Java.Lang
> using Java.IO
>
> and replace Sharpen-specific classes with standard .NET classes:
> Sharpen.Collections.*
> Sharpen.Runtime.*
>
>
>
> > Evaluate Sharpen as a port tool
> > -------------------------------
> >
> >                 Key: LUCENENET-380
> >                 URL: https://issues.apache.org/jira/browse/LUCENENET-380
> >             Project: Lucene.Net
> >          Issue Type: Task
> >            Reporter: George Aroush
> >         Attachments:
3.0.2_JavaToCSharpConverter_AfterPostProcessing.zip,
> 3.0.2_JavaToCSharpConverter_NoPostProcessing.zip, IndexWriter.java,
> Lucene.Net.3_0_3_Sharpen20110106.zip, Lucene.Net.Sharpen20101104.zip,
> Lucene.Net.Sharpen20101114.zip, NIOFSDirectory.java, QueryParser.java,
> TestBufferedIndexInput.java, TestDateFilter.java
> >
> >
> > This task is to evaluate Sharpen as a port tool for Lucene.Net.
> > The files to be evaluated are attached.  We need to run those files
> (which are off Java Lucene 2.9.2) against Sharpen and compare the result
> against JLCA result.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

Re: [jira] Commented: (LUCENENET-380) Evaluate Sharpen as a port tool

Posted by Peter Mateja <pe...@gmail.com>.

Nice work Alex!

Not that this represents a solution, but I did load up the core source from
your conversion into a VS2010 project, then ran Resharper's code cleanup on
it.

This process took care of all the unused 'using Java.*' references, cleanup
up formatting, etc.  However, I'm still seeing a good many things that need
work:

1) ICloseable -> IDisposable, including refactoring of the implementation
from Close() to Dispose() (and also considering any additional refactoring
of the Disposable pattern.)
2) IFieldCache is marked as an interface, but has tons of static fields,
subclasses and interfaces.  This may be ok in Java, but not in C#.  Not sure
what the best course of action here might be... perhaps create an abstract
base class called FieldCache or FieldCacheBase to house this stuff, and pull
out the nested classes / interfaces into their own files.
3) Use of a generic WeakReference<>, which doesn't exist in generic form in
the .Net Framework.  This is something which could either be refactored or
implemented as generic.
4) ICloneable interface not implemented (see IndexInput.cs)
5) Unsigned bitwise shift assignment operator doesn't exist in C#.  See
IndexOutput.cs, WriteVInt() method.  The line i >>>= 7; in java flags an
error in C#.  I'm not entirely sure in this case, but I believe this can
safely be converted to: i >>= 7; in this case, especially given the comment
that negative numbers are not supported.
6) Use of Java DecimalFormat class.  An appropriate .Net replacement should
be easily substituted with some refactoring of the code.
7) Use of Runtime.IdentityHashCode().  Not sure how necessary this is.
8) Java specific value type parsing calls should be refactored to .Net (e.g.
double.ParseDouble() => double.Parse())
9) Use of the java ReadResolve() object serialization pattern needs to be
analyzed / refactored (see FieldCache.DefaultByteParser (or in the
translated version, IFieldCache._IByteParser)).
10) Use of Sharpen references.
11) Use of Java's NumberFormatException... should be refactored to use an
appropriate standard exception type (perhaps FormatException, though I'm not
sure this is appropriate) or create an internal Exception class for this
case.

There's plenty more build issues... I need to put this down for the rest of
the day, so I thought I'd at least get this out to the list.

Peter Mateja
peter.mateja@gmail.com

On Fri, Jan 7, 2011 at 9:34 AM, Neal Granroth (JIRA) <ji...@apache.org>wrote:

>
>    [
> https://issues.apache.org/jira/browse/LUCENENET-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978816#action_12978816]
>
> Neal Granroth commented on LUCENENET-380:
> -----------------------------------------
>
> Thanks Alex,
>
> What would be the plan for handling the Sharpen artifacts that prevent the
> converted code from being built by the .NET SDK compiler?
>
> Do you envision a post-conversion script to strip out statements like:
> using Java.Lang
> using Java.IO
>
> and replace Sharpen-specific classes with standard .NET classes:
> Sharpen.Collections.*
> Sharpen.Runtime.*
>
>
>
> > Evaluate Sharpen as a port tool
> > -------------------------------
> >
> >                 Key: LUCENENET-380
> >                 URL: https://issues.apache.org/jira/browse/LUCENENET-380
> >             Project: Lucene.Net
> >          Issue Type: Task
> >            Reporter: George Aroush
> >         Attachments: 3.0.2_JavaToCSharpConverter_AfterPostProcessing.zip,
> 3.0.2_JavaToCSharpConverter_NoPostProcessing.zip, IndexWriter.java,
> Lucene.Net.3_0_3_Sharpen20110106.zip, Lucene.Net.Sharpen20101104.zip,
> Lucene.Net.Sharpen20101114.zip, NIOFSDirectory.java, QueryParser.java,
> TestBufferedIndexInput.java, TestDateFilter.java
> >
> >
> > This task is to evaluate Sharpen as a port tool for Lucene.Net.
> > The files to be evaluated are attached.  We need to run those files
> (which are off Java Lucene 2.9.2) against Sharpen and compare the result
> against JLCA result.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>