You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "James Sullivan (JIRA)" <ji...@apache.org> on 2012/11/12 06:12:11 UTC

[jira] [Created] (NUTCH-1497) Better default gora-sql-mapping.xml with larger field sizes for MySQL

James Sullivan created NUTCH-1497:
-------------------------------------

             Summary: Better default gora-sql-mapping.xml with larger field sizes for MySQL
                 Key: NUTCH-1497
                 URL: https://issues.apache.org/jira/browse/NUTCH-1497
             Project: Nutch
          Issue Type: Improvement
          Components: storage
    Affects Versions: 2.2
         Environment: MySQL Backend
            Reporter: James Sullivan
            Priority: Minor


The current generic default gora-sql-mapping.xml has field sizes that are too small in almost all situations when used with MySQL. I have included a mapping which will work better for MySQL (takes slightly more space but will be able to handle larger fields necessary for real world use). Includes patch from Nutch-1490 and resolves the non-Unicode part of Nutch-1473. I believe it is not possible to use the same gora-sql-mapping for both hsqldb and MySQL without a significantly degraded lowest common denominator resulting. Should the user manually rename the attached file to gora-sql-mapping.xml or is there a way to have Nutch automatically use it when MySQL is selected in other configurations (Ivy.xml or gora.properties)?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1497) Better default gora-sql-mapping.xml with larger field sizes for MySQL

Posted by "Nathan Gass (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495269#comment-13495269 ] 

Nathan Gass commented on NUTCH-1497:
------------------------------------

Some comments about the differences to NUTCH-1490:

The renaming of column typ was because this column is oddly named and should imho be done, but is not actually specific to mysql. The length increasing of the column on the other hand is necessary as I got truncation exceptions with the typ column set to length 32. Of course if this should not happen I can try to find out which Url was responsible for this Exception to get at the root cause.

Setting outlinks to the same length as inlinks makes it unnecessary large (at least as soon as the maximum outlink number actually gets enforced in nutch). With the patch in NUTCH-1490 gora uses the column type mediumblob whereas with this file it would use longblob. I've no idea if this difference is significant.

Increasing the maximum length of urls and titles only makes the truncation errors occur less frequent. A real fix is to enforce the given maximum length with appropriate checks in nutch code.

                
> Better default gora-sql-mapping.xml with larger field sizes for MySQL
> ---------------------------------------------------------------------
>
>                 Key: NUTCH-1497
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1497
>             Project: Nutch
>          Issue Type: Improvement
>          Components: storage
>    Affects Versions: 2.2
>         Environment: MySQL Backend
>            Reporter: James Sullivan
>            Priority: Minor
>              Labels: MySQL
>         Attachments: gora-mysql-mapping.xml
>
>
> The current generic default gora-sql-mapping.xml has field sizes that are too small in almost all situations when used with MySQL. I have included a mapping which will work better for MySQL (takes slightly more space but will be able to handle larger fields necessary for real world use). Includes patch from Nutch-1490 and resolves the non-Unicode part of Nutch-1473. I believe it is not possible to use the same gora-sql-mapping for both hsqldb and MySQL without a significantly degraded lowest common denominator resulting. Should the user manually rename the attached file to gora-sql-mapping.xml or is there a way to have Nutch automatically use it when MySQL is selected in other configurations (Ivy.xml or gora.properties)?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1497) Better default gora-sql-mapping.xml with larger field sizes for MySQL

Posted by "James Sullivan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496131#comment-13496131 ] 

James Sullivan commented on NUTCH-1497:
---------------------------------------

I agree one standard file for SQL databases would be preferable but one example of why I couldn't stay with one file for both hsql and MySQL is the text column was being turned into a blob, not text at larger sizes. 
                
> Better default gora-sql-mapping.xml with larger field sizes for MySQL
> ---------------------------------------------------------------------
>
>                 Key: NUTCH-1497
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1497
>             Project: Nutch
>          Issue Type: Improvement
>          Components: storage
>    Affects Versions: 2.2
>         Environment: MySQL Backend
>            Reporter: James Sullivan
>            Priority: Minor
>              Labels: MySQL
>         Attachments: gora-mysql-mapping.xml, gora-mysql-mapping.xml
>
>
> The current generic default gora-sql-mapping.xml has field sizes that are too small in almost all situations when used with MySQL. I have included a mapping which will work better for MySQL (takes slightly more space but will be able to handle larger fields necessary for real world use). Includes patch from Nutch-1490 and resolves the non-Unicode part of Nutch-1473. I believe it is not possible to use the same gora-sql-mapping for both hsqldb and MySQL without a significantly degraded lowest common denominator resulting. Should the user manually rename the attached file to gora-sql-mapping.xml or is there a way to have Nutch automatically use it when MySQL is selected in other configurations (Ivy.xml or gora.properties)?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (NUTCH-1497) Better default gora-sql-mapping.xml with larger field sizes for MySQL

Posted by "James Sullivan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Sullivan updated NUTCH-1497:
----------------------------------

    Patch Info:   (was: Patch Available)
    
> Better default gora-sql-mapping.xml with larger field sizes for MySQL
> ---------------------------------------------------------------------
>
>                 Key: NUTCH-1497
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1497
>             Project: Nutch
>          Issue Type: Improvement
>          Components: storage
>    Affects Versions: 2.2
>         Environment: MySQL Backend
>            Reporter: James Sullivan
>            Priority: Minor
>              Labels: MySQL
>         Attachments: gora-mysql-mapping.xml
>
>
> The current generic default gora-sql-mapping.xml has field sizes that are too small in almost all situations when used with MySQL. I have included a mapping which will work better for MySQL (takes slightly more space but will be able to handle larger fields necessary for real world use). Includes patch from Nutch-1490 and resolves the non-Unicode part of Nutch-1473. I believe it is not possible to use the same gora-sql-mapping for both hsqldb and MySQL without a significantly degraded lowest common denominator resulting. Should the user manually rename the attached file to gora-sql-mapping.xml or is there a way to have Nutch automatically use it when MySQL is selected in other configurations (Ivy.xml or gora.properties)?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1497) Better default gora-sql-mapping.xml with larger field sizes for MySQL

Posted by "Nathan Gass (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495272#comment-13495272 ] 

Nathan Gass commented on NUTCH-1497:
------------------------------------

It seems to me that gora should hide as many database specifics as possible, and gora actually seems to do this just fine in this cases (using mysql specific column types when length is large). What are the disadvantages with hsqldb using the new length values?

                
> Better default gora-sql-mapping.xml with larger field sizes for MySQL
> ---------------------------------------------------------------------
>
>                 Key: NUTCH-1497
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1497
>             Project: Nutch
>          Issue Type: Improvement
>          Components: storage
>    Affects Versions: 2.2
>         Environment: MySQL Backend
>            Reporter: James Sullivan
>            Priority: Minor
>              Labels: MySQL
>         Attachments: gora-mysql-mapping.xml
>
>
> The current generic default gora-sql-mapping.xml has field sizes that are too small in almost all situations when used with MySQL. I have included a mapping which will work better for MySQL (takes slightly more space but will be able to handle larger fields necessary for real world use). Includes patch from Nutch-1490 and resolves the non-Unicode part of Nutch-1473. I believe it is not possible to use the same gora-sql-mapping for both hsqldb and MySQL without a significantly degraded lowest common denominator resulting. Should the user manually rename the attached file to gora-sql-mapping.xml or is there a way to have Nutch automatically use it when MySQL is selected in other configurations (Ivy.xml or gora.properties)?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1497) Better default gora-sql-mapping.xml with larger field sizes for MySQL

Posted by "James Sullivan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496954#comment-13496954 ] 

James Sullivan commented on NUTCH-1497:
---------------------------------------

I have attached it as a patch. MySQL users would still need to rename it to gora-sql-mapping.xml in order to use it. 
                
> Better default gora-sql-mapping.xml with larger field sizes for MySQL
> ---------------------------------------------------------------------
>
>                 Key: NUTCH-1497
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1497
>             Project: Nutch
>          Issue Type: Improvement
>          Components: storage
>    Affects Versions: 2.2
>         Environment: MySQL Backend
>            Reporter: James Sullivan
>            Priority: Minor
>              Labels: MySQL
>         Attachments: gora-mysql-mapping-patch, gora-mysql-mapping.xml, gora-mysql-mapping.xml
>
>
> The current generic default gora-sql-mapping.xml has field sizes that are too small in almost all situations when used with MySQL. I have included a mapping which will work better for MySQL (takes slightly more space but will be able to handle larger fields necessary for real world use). Includes patch from Nutch-1490 and resolves the non-Unicode part of Nutch-1473. I believe it is not possible to use the same gora-sql-mapping for both hsqldb and MySQL without a significantly degraded lowest common denominator resulting. Should the user manually rename the attached file to gora-sql-mapping.xml or is there a way to have Nutch automatically use it when MySQL is selected in other configurations (Ivy.xml or gora.properties)?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (NUTCH-1497) Better default gora-sql-mapping.xml with larger field sizes for MySQL

Posted by "James Sullivan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Sullivan updated NUTCH-1497:
----------------------------------

    Patch Info: Patch Available
    
> Better default gora-sql-mapping.xml with larger field sizes for MySQL
> ---------------------------------------------------------------------
>
>                 Key: NUTCH-1497
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1497
>             Project: Nutch
>          Issue Type: Improvement
>          Components: storage
>    Affects Versions: 2.2
>         Environment: MySQL Backend
>            Reporter: James Sullivan
>            Priority: Minor
>              Labels: MySQL
>         Attachments: gora-mysql-mapping-patch, gora-mysql-mapping.xml, gora-mysql-mapping.xml
>
>
> The current generic default gora-sql-mapping.xml has field sizes that are too small in almost all situations when used with MySQL. I have included a mapping which will work better for MySQL (takes slightly more space but will be able to handle larger fields necessary for real world use). Includes patch from Nutch-1490 and resolves the non-Unicode part of Nutch-1473. I believe it is not possible to use the same gora-sql-mapping for both hsqldb and MySQL without a significantly degraded lowest common denominator resulting. Should the user manually rename the attached file to gora-sql-mapping.xml or is there a way to have Nutch automatically use it when MySQL is selected in other configurations (Ivy.xml or gora.properties)?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1497) Better default gora-sql-mapping.xml with larger field sizes for MySQL

Posted by "James Sullivan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496122#comment-13496122 ] 

James Sullivan commented on NUTCH-1497:
---------------------------------------

Nathan I've made the changes to the lengths and uploaded. Could you check it is correct. One note I left the column as typ, as although I agree it is odd, I thought consistency was more important.
                
> Better default gora-sql-mapping.xml with larger field sizes for MySQL
> ---------------------------------------------------------------------
>
>                 Key: NUTCH-1497
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1497
>             Project: Nutch
>          Issue Type: Improvement
>          Components: storage
>    Affects Versions: 2.2
>         Environment: MySQL Backend
>            Reporter: James Sullivan
>            Priority: Minor
>              Labels: MySQL
>         Attachments: gora-mysql-mapping.xml, gora-mysql-mapping.xml
>
>
> The current generic default gora-sql-mapping.xml has field sizes that are too small in almost all situations when used with MySQL. I have included a mapping which will work better for MySQL (takes slightly more space but will be able to handle larger fields necessary for real world use). Includes patch from Nutch-1490 and resolves the non-Unicode part of Nutch-1473. I believe it is not possible to use the same gora-sql-mapping for both hsqldb and MySQL without a significantly degraded lowest common denominator resulting. Should the user manually rename the attached file to gora-sql-mapping.xml or is there a way to have Nutch automatically use it when MySQL is selected in other configurations (Ivy.xml or gora.properties)?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (NUTCH-1497) Better default gora-sql-mapping.xml with larger field sizes for MySQL

Posted by "James Sullivan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Sullivan updated NUTCH-1497:
----------------------------------

    Attachment: gora-mysql-mapping.xml
    
> Better default gora-sql-mapping.xml with larger field sizes for MySQL
> ---------------------------------------------------------------------
>
>                 Key: NUTCH-1497
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1497
>             Project: Nutch
>          Issue Type: Improvement
>          Components: storage
>    Affects Versions: 2.2
>         Environment: MySQL Backend
>            Reporter: James Sullivan
>            Priority: Minor
>              Labels: MySQL
>         Attachments: gora-mysql-mapping.xml, gora-mysql-mapping.xml
>
>
> The current generic default gora-sql-mapping.xml has field sizes that are too small in almost all situations when used with MySQL. I have included a mapping which will work better for MySQL (takes slightly more space but will be able to handle larger fields necessary for real world use). Includes patch from Nutch-1490 and resolves the non-Unicode part of Nutch-1473. I believe it is not possible to use the same gora-sql-mapping for both hsqldb and MySQL without a significantly degraded lowest common denominator resulting. Should the user manually rename the attached file to gora-sql-mapping.xml or is there a way to have Nutch automatically use it when MySQL is selected in other configurations (Ivy.xml or gora.properties)?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (NUTCH-1497) Better default gora-sql-mapping.xml with larger field sizes for MySQL

Posted by "James Sullivan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Sullivan updated NUTCH-1497:
----------------------------------

    Attachment: gora-mysql-mapping-patch
    
> Better default gora-sql-mapping.xml with larger field sizes for MySQL
> ---------------------------------------------------------------------
>
>                 Key: NUTCH-1497
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1497
>             Project: Nutch
>          Issue Type: Improvement
>          Components: storage
>    Affects Versions: 2.2
>         Environment: MySQL Backend
>            Reporter: James Sullivan
>            Priority: Minor
>              Labels: MySQL
>         Attachments: gora-mysql-mapping-patch, gora-mysql-mapping.xml, gora-mysql-mapping.xml
>
>
> The current generic default gora-sql-mapping.xml has field sizes that are too small in almost all situations when used with MySQL. I have included a mapping which will work better for MySQL (takes slightly more space but will be able to handle larger fields necessary for real world use). Includes patch from Nutch-1490 and resolves the non-Unicode part of Nutch-1473. I believe it is not possible to use the same gora-sql-mapping for both hsqldb and MySQL without a significantly degraded lowest common denominator resulting. Should the user manually rename the attached file to gora-sql-mapping.xml or is there a way to have Nutch automatically use it when MySQL is selected in other configurations (Ivy.xml or gora.properties)?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1497) Better default gora-sql-mapping.xml with larger field sizes for MySQL

Posted by "James Sullivan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496959#comment-13496959 ] 

James Sullivan commented on NUTCH-1497:
---------------------------------------

I have attached it as a patch. Sorry it took so long. At it stands, MySQL
users will still have to rename it to gora-sql-mapping.xml in order to use
it.




On Mon, Nov 12, 2012 at 10:13 PM, Lewis John McGibbney (JIRA) <


                
> Better default gora-sql-mapping.xml with larger field sizes for MySQL
> ---------------------------------------------------------------------
>
>                 Key: NUTCH-1497
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1497
>             Project: Nutch
>          Issue Type: Improvement
>          Components: storage
>    Affects Versions: 2.2
>         Environment: MySQL Backend
>            Reporter: James Sullivan
>            Priority: Minor
>              Labels: MySQL
>         Attachments: gora-mysql-mapping-patch, gora-mysql-mapping.xml, gora-mysql-mapping.xml
>
>
> The current generic default gora-sql-mapping.xml has field sizes that are too small in almost all situations when used with MySQL. I have included a mapping which will work better for MySQL (takes slightly more space but will be able to handle larger fields necessary for real world use). Includes patch from Nutch-1490 and resolves the non-Unicode part of Nutch-1473. I believe it is not possible to use the same gora-sql-mapping for both hsqldb and MySQL without a significantly degraded lowest common denominator resulting. Should the user manually rename the attached file to gora-sql-mapping.xml or is there a way to have Nutch automatically use it when MySQL is selected in other configurations (Ivy.xml or gora.properties)?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (NUTCH-1497) Better default gora-sql-mapping.xml with larger field sizes for MySQL

Posted by "James Sullivan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Sullivan updated NUTCH-1497:
----------------------------------

    Attachment: gora-mysql-mapping.xml
    
> Better default gora-sql-mapping.xml with larger field sizes for MySQL
> ---------------------------------------------------------------------
>
>                 Key: NUTCH-1497
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1497
>             Project: Nutch
>          Issue Type: Improvement
>          Components: storage
>    Affects Versions: 2.2
>         Environment: MySQL Backend
>            Reporter: James Sullivan
>            Priority: Minor
>              Labels: MySQL
>         Attachments: gora-mysql-mapping.xml
>
>
> The current generic default gora-sql-mapping.xml has field sizes that are too small in almost all situations when used with MySQL. I have included a mapping which will work better for MySQL (takes slightly more space but will be able to handle larger fields necessary for real world use). Includes patch from Nutch-1490 and resolves the non-Unicode part of Nutch-1473. I believe it is not possible to use the same gora-sql-mapping for both hsqldb and MySQL without a significantly degraded lowest common denominator resulting. Should the user manually rename the attached file to gora-sql-mapping.xml or is there a way to have Nutch automatically use it when MySQL is selected in other configurations (Ivy.xml or gora.properties)?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1497) Better default gora-sql-mapping.xml with larger field sizes for MySQL

Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495243#comment-13495243 ] 

Lewis John McGibbney commented on NUTCH-1497:
---------------------------------------------

Hi James, this is great. Is it possible for you to submit a patch? If a patch is applied it is much easier for us to track the provenance through the viewvc system as oppose to a complete file change.
                
> Better default gora-sql-mapping.xml with larger field sizes for MySQL
> ---------------------------------------------------------------------
>
>                 Key: NUTCH-1497
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1497
>             Project: Nutch
>          Issue Type: Improvement
>          Components: storage
>    Affects Versions: 2.2
>         Environment: MySQL Backend
>            Reporter: James Sullivan
>            Priority: Minor
>              Labels: MySQL
>         Attachments: gora-mysql-mapping.xml
>
>
> The current generic default gora-sql-mapping.xml has field sizes that are too small in almost all situations when used with MySQL. I have included a mapping which will work better for MySQL (takes slightly more space but will be able to handle larger fields necessary for real world use). Includes patch from Nutch-1490 and resolves the non-Unicode part of Nutch-1473. I believe it is not possible to use the same gora-sql-mapping for both hsqldb and MySQL without a significantly degraded lowest common denominator resulting. Should the user manually rename the attached file to gora-sql-mapping.xml or is there a way to have Nutch automatically use it when MySQL is selected in other configurations (Ivy.xml or gora.properties)?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira