You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Lucene/Solr QA (JIRA)" <ji...@apache.org> on 2018/09/03 09:05:00 UTC

[jira] [Commented] (LUCENE-8476) Bug fix and optimizations in UserDictionary(KoreanAnalyzer)

    [ https://issues.apache.org/jira/browse/LUCENE-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601915#comment-16601915 ] 

Lucene/Solr QA commented on LUCENE-8476:
----------------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 23s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | {color:green}  0m 17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | {color:green}  0m 17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | {color:green}  0m 17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 15s{color} | {color:green} nori in the patch passed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}  2m 27s{color} | {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | LUCENE-8476 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12938050/LUCENE-8476.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  validatesourcepatterns  |
| uname | Linux lucene1-us-west 4.4.0-130-generic #156~14.04.1-Ubuntu SMP Thu Jun 14 13:51:47 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-LUCENE-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh |
| git revision | master / d93c46e |
| ant | version: Apache Ant(TM) version 1.9.3 compiled on July 24 2018 |
| Default Java | 1.8.0_172 |
|  Test Results | https://builds.apache.org/job/PreCommit-LUCENE-Build/82/testReport/ |
| modules | C: lucene/analysis/nori U: lucene/analysis/nori |
| Console output | https://builds.apache.org/job/PreCommit-LUCENE-Build/82/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



>  Bug fix and optimizations in UserDictionary(KoreanAnalyzer)
> ------------------------------------------------------------
>
>                 Key: LUCENE-8476
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8476
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/analysis
>            Reporter: Namgyu Kim
>            Priority: Major
>              Labels: bugfix, memory-leak, optimization, patch-available
>         Attachments: LUCENE-8476.patch
>
>
> ■ Bug fix
> 1) BufferedReader's close method is not called.
> {code:java}
> // Line 57 method
> public static UserDictionary open(Reader reader) throws IOException {
>   BufferedReader br = new BufferedReader(reader);
>   String line = null;
>   List<String> entries = new ArrayList<>();
>   // text + optional segmentations
>   while ((line = br.readLine()) != null) {
>     ...
>   }
>   if (entries.isEmpty()) {
>     return null;
>   } else {
>     return new UserDictionary(entries);
>   }
> }{code}
> If you look at the code above, there is no close() method for the "br" variable.
>  As I know, BufferedReader can cause a +memory leak+ if the close method is not called.
>  So I changed the code below.
> {code:java}
> // Line 57 method
> public static UserDictionary open(Reader reader) throws IOException {
>   String line = null;
>   List<String> entries = new ArrayList<>();
>   // text + optional segmentations
>   try (BufferedReader br = new BufferedReader(reader)) {
>     while ((line = br.readLine()) != null) {
>       ...
>     }
>   }
>   if (entries.isEmpty()) {
>     return null;
>   } else {
>     return new UserDictionary(entries);
>   }
> }
> {code}
> I solved this problem with "[try-with-resources|https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html]" method available since Java 7.
>  
> ■ Optimizations
> 1) Change from Collections.sort to List.sort (UserDictionary constructor)
> {code:java}
> // Line 82 method
> private UserDictionary(List<String> entries) throws IOException {
>   final CharacterDefinition charDef = CharacterDefinition.getInstance();
>   Collections.sort(entries,
>       Comparator.comparing(e -> e.split("\\s+")[0]));
>   PositiveIntOutputs fstOutput = PositiveIntOutputs.getSingleton();
>   ...
> }{code}
> List.sort in Java 8 is known to be faster than existing Collections.sort. ([http://ankitsambyal.blogspot.com/2014/03/difference-between-listsort-and.html]) So I changed the code below.
> {code:java}
> // Line 82 method
> private UserDictionary(List<String> entries) throws IOException {
>   final CharacterDefinition charDef = CharacterDefinition.getInstance();
>   entries.sort(Comparator.comparing(e -> e.split("\\s+")[0]));
>   PositiveIntOutputs fstOutput = PositiveIntOutputs.getSingleton();
>   ...
> }{code}
>  
> 2) Remove unnecessary null check (UserDictionary constructor)
> {code:java}
> // Line 82 method
> private UserDictionary(List<String> entries) throws IOException {
>   ...
>   String lastToken = null;
>   ...
>   for (String entry : entries) {
>     String[] splits = entry.split("\\s+");
>     String token = splits[0];
>     if (lastToken != null && token.equals(lastToken)) {
>       continue;
>     }
>     char lastChar = entry.charAt(entry.length()-1);
>   ...
> }{code}
> Looking at this part of the code,
> {code:java}
> if (lastToken != null && token.equals(lastToken)) {
>   continue;
> }{code}
> A null check for lastToken is unnecessary.
>  Because the equals method of the String class internally performs a null check.
>  So I changed the code as below.
> {code:java}
> // Line 82 method
> private UserDictionary(List<String> entries) throws IOException {
>   ...
>   String lastToken = null;
>   ...
>   for (String entry : entries) {
>     String[] splits = entry.split("\\s+");
>     String token = splits[0];
>     if (token.equals(lastToken)) {
>       continue;
>     }
>     char lastChar = entry.charAt(entry.length()-1);
>   ...
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org