You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "James Dyer (JIRA)" <ji...@apache.org> on 2011/05/26 23:53:47 UTC

[jira] [Created] (SOLR-2549) DIH LineEntityProcessor support for delimited & fixed-width files

DIH LineEntityProcessor support for delimited & fixed-width files
-----------------------------------------------------------------

                 Key: SOLR-2549
                 URL: https://issues.apache.org/jira/browse/SOLR-2549
             Project: Solr
          Issue Type: Improvement
          Components: contrib - DataImportHandler
    Affects Versions: 4.0
            Reporter: James Dyer
            Priority: Minor
         Attachments: SOLR-2549.patch

Provides support for Fixed Width and Delimited Files without needing to write a Transformer. 

The following xml properties are supported with this version of LineEntityProcessor:

For fixed width files:
 - colDef[#]

For Delimited files:
 - fieldDelimiterRegex
 - firstLineHasFieldnames
 - delimitedFieldNames
 - delimitedFieldTypes

These properties are described in the api documentation.  See patch.

When combined with the cache improvements from SOLR-2382 this allows you to join a flat file entity with other entities (sql, etc).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2549) DIH LineEntityProcessor support for delimited & fixed-width files

Posted by "Pulkit Singhal (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13125170#comment-13125170 ] 

Pulkit Singhal commented on SOLR-2549:
--------------------------------------

@jdyer Can you please post soms samples in the comments? The patch docs are good but this would be also very helpful. If you don't mind :)
                
> DIH LineEntityProcessor support for delimited & fixed-width files
> -----------------------------------------------------------------
>
>                 Key: SOLR-2549
>                 URL: https://issues.apache.org/jira/browse/SOLR-2549
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0
>            Reporter: James Dyer
>            Priority: Minor
>         Attachments: SOLR-2549.patch, SOLR-2549.patch
>
>
> Provides support for Fixed Width and Delimited Files without needing to write a Transformer. 
> The following xml properties are supported with this version of LineEntityProcessor:
> For fixed width files:
>  - colDef[#]
> For Delimited files:
>  - fieldDelimiterRegex
>  - firstLineHasFieldnames
>  - delimitedFieldNames
>  - delimitedFieldTypes
> These properties are described in the api documentation.  See patch.
> When combined with the cache improvements from SOLR-2382 this allows you to join a flat file entity with other entities (sql, etc).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Comment Edited] (SOLR-2549) DIH LineEntityProcessor support for delimited & fixed-width files

Posted by "zakaria benzidalmal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493891#comment-13493891 ] 

zakaria benzidalmal edited comment on SOLR-2549 at 11/9/12 10:38 AM:
---------------------------------------------------------------------

patch for solr 4.0.0 available #v400-SOLR-2549.patch


                
      was (Author: zakibenz):
    patch for solr 4.0.0 available 
                  
> DIH LineEntityProcessor support for delimited & fixed-width files
> -----------------------------------------------------------------
>
>                 Key: SOLR-2549
>                 URL: https://issues.apache.org/jira/browse/SOLR-2549
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0-ALPHA
>            Reporter: James Dyer
>            Priority: Minor
>         Attachments: SOLR-2549.patch, SOLR-2549.patch, SOLR-2549.patch, SOLR-2549.patch, v400-SOLR-2549.patch
>
>
> Provides support for Fixed Width and Delimited Files without needing to write a Transformer. 
> The following xml properties are supported with this version of LineEntityProcessor:
> For fixed width files:
>  - colDef[#]
> For Delimited files:
>  - fieldDelimiterRegex
>  - firstLineHasFieldnames
>  - delimitedFieldNames
>  - delimitedFieldTypes
> These properties are described in the api documentation.  See patch.
> When combined with the cache improvements from SOLR-2382 this allows you to join a flat file entity with other entities (sql, etc).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2549) DIH LineEntityProcessor support for delimited & fixed-width files

Posted by "zakaria benzidalmal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

zakaria benzidalmal updated SOLR-2549:
--------------------------------------

    Attachment: v400-SOLR-2549.patch

patch for solr 4.0.0
                
> DIH LineEntityProcessor support for delimited & fixed-width files
> -----------------------------------------------------------------
>
>                 Key: SOLR-2549
>                 URL: https://issues.apache.org/jira/browse/SOLR-2549
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0-ALPHA
>            Reporter: James Dyer
>            Priority: Minor
>         Attachments: SOLR-2549.patch, SOLR-2549.patch, SOLR-2549.patch, SOLR-2549.patch, v400-SOLR-2549.patch
>
>
> Provides support for Fixed Width and Delimited Files without needing to write a Transformer. 
> The following xml properties are supported with this version of LineEntityProcessor:
> For fixed width files:
>  - colDef[#]
> For Delimited files:
>  - fieldDelimiterRegex
>  - firstLineHasFieldnames
>  - delimitedFieldNames
>  - delimitedFieldTypes
> These properties are described in the api documentation.  See patch.
> When combined with the cache improvements from SOLR-2382 this allows you to join a flat file entity with other entities (sql, etc).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2549) DIH LineEntityProcessor support for delimited & fixed-width files

Posted by "zakaria benzidalmal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

zakaria benzidalmal updated SOLR-2549:
--------------------------------------

    Attachment: SOLR-2549.patch

Fix NPE Bug when escape parameter is not specified.
                
> DIH LineEntityProcessor support for delimited & fixed-width files
> -----------------------------------------------------------------
>
>                 Key: SOLR-2549
>                 URL: https://issues.apache.org/jira/browse/SOLR-2549
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0-ALPHA
>            Reporter: James Dyer
>            Priority: Minor
>         Attachments: SOLR-2549.patch, SOLR-2549.patch, SOLR-2549.patch, SOLR-2549.patch
>
>
> Provides support for Fixed Width and Delimited Files without needing to write a Transformer. 
> The following xml properties are supported with this version of LineEntityProcessor:
> For fixed width files:
>  - colDef[#]
> For Delimited files:
>  - fieldDelimiterRegex
>  - firstLineHasFieldnames
>  - delimitedFieldNames
>  - delimitedFieldTypes
> These properties are described in the api documentation.  See patch.
> When combined with the cache improvements from SOLR-2382 this allows you to join a flat file entity with other entities (sql, etc).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Comment Edited] (SOLR-2549) DIH LineEntityProcessor support for delimited & fixed-width files

Posted by "zakaria benzidalmal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493871#comment-13493871 ] 

zakaria benzidalmal edited comment on SOLR-2549 at 11/9/12 10:10 AM:
---------------------------------------------------------------------

thanks to james for his help ;)
                
      was (Author: zakibenz):
    thanks to james for his help
                  
> DIH LineEntityProcessor support for delimited & fixed-width files
> -----------------------------------------------------------------
>
>                 Key: SOLR-2549
>                 URL: https://issues.apache.org/jira/browse/SOLR-2549
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0-ALPHA
>            Reporter: James Dyer
>            Priority: Minor
>         Attachments: SOLR-2549.patch, SOLR-2549.patch, SOLR-2549.patch, SOLR-2549.patch
>
>
> Provides support for Fixed Width and Delimited Files without needing to write a Transformer. 
> The following xml properties are supported with this version of LineEntityProcessor:
> For fixed width files:
>  - colDef[#]
> For Delimited files:
>  - fieldDelimiterRegex
>  - firstLineHasFieldnames
>  - delimitedFieldNames
>  - delimitedFieldTypes
> These properties are described in the api documentation.  See patch.
> When combined with the cache improvements from SOLR-2382 this allows you to join a flat file entity with other entities (sql, etc).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Comment Edited] (SOLR-2549) DIH LineEntityProcessor support for delimited & fixed-width files

Posted by "zakaria benzidalmal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493891#comment-13493891 ] 

zakaria benzidalmal edited comment on SOLR-2549 at 11/9/12 10:38 AM:
---------------------------------------------------------------------

patch for solr 4.0.0 available 
                
      was (Author: zakibenz):
    patch for solr 4.0.0
                  
> DIH LineEntityProcessor support for delimited & fixed-width files
> -----------------------------------------------------------------
>
>                 Key: SOLR-2549
>                 URL: https://issues.apache.org/jira/browse/SOLR-2549
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0-ALPHA
>            Reporter: James Dyer
>            Priority: Minor
>         Attachments: SOLR-2549.patch, SOLR-2549.patch, SOLR-2549.patch, SOLR-2549.patch, v400-SOLR-2549.patch
>
>
> Provides support for Fixed Width and Delimited Files without needing to write a Transformer. 
> The following xml properties are supported with this version of LineEntityProcessor:
> For fixed width files:
>  - colDef[#]
> For Delimited files:
>  - fieldDelimiterRegex
>  - firstLineHasFieldnames
>  - delimitedFieldNames
>  - delimitedFieldTypes
> These properties are described in the api documentation.  See patch.
> When combined with the cache improvements from SOLR-2382 this allows you to join a flat file entity with other entities (sql, etc).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2549) DIH LineEntityProcessor support for delimited & fixed-width files

Posted by "James Dyer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Dyer updated SOLR-2549:
-----------------------------

    Attachment: SOLR-2549.patch

Here is a version sync'ed with the current Trunk.

> DIH LineEntityProcessor support for delimited & fixed-width files
> -----------------------------------------------------------------
>
>                 Key: SOLR-2549
>                 URL: https://issues.apache.org/jira/browse/SOLR-2549
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0
>            Reporter: James Dyer
>            Priority: Minor
>         Attachments: SOLR-2549.patch, SOLR-2549.patch
>
>
> Provides support for Fixed Width and Delimited Files without needing to write a Transformer. 
> The following xml properties are supported with this version of LineEntityProcessor:
> For fixed width files:
>  - colDef[#]
> For Delimited files:
>  - fieldDelimiterRegex
>  - firstLineHasFieldnames
>  - delimitedFieldNames
>  - delimitedFieldTypes
> These properties are described in the api documentation.  See patch.
> When combined with the cache improvements from SOLR-2382 this allows you to join a flat file entity with other entities (sql, etc).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2549) DIH LineEntityProcessor support for delimited & fixed-width files

Posted by "zakaria benzidalmal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493870#comment-13493870 ] 

zakaria benzidalmal commented on SOLR-2549:
-------------------------------------------

data config example:

<dataConfig>
        <dataSource name="URL" baseUrl="file:///c:/work/solr/example/example-DIH/solr/csv/in/" type="URLDataSource" />
        <document name="FixedWidthCounts">
                
                <!-- for delimited files -->
                <entity
                        name="sites"
                        processor="org.apache.solr.handler.dataimport.LineEntityProcessor"
                        dataSource="URL"
                        url="data.csv"
                        header="true"
		        separator=","
                        ... <!-- you can specify here other updatecsv request handler parameters -->						
                />

                <!-- for fixed-width files -->
                <entity
                        name="sites"
                        processor="org.apache.solr.handler.dataimport.LineEntityProcessor"
                        dataSource="URL"
                        url="data.csv"
                        colDef1="ID,0,6,STRING,0,LEFT"
                        colDef2="NAME,6,26,STRING,0,LEFT"
			...
                />


        </document>
</dataConfig>
                
> DIH LineEntityProcessor support for delimited & fixed-width files
> -----------------------------------------------------------------
>
>                 Key: SOLR-2549
>                 URL: https://issues.apache.org/jira/browse/SOLR-2549
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0-ALPHA
>            Reporter: James Dyer
>            Priority: Minor
>         Attachments: SOLR-2549.patch, SOLR-2549.patch, SOLR-2549.patch, SOLR-2549.patch
>
>
> Provides support for Fixed Width and Delimited Files without needing to write a Transformer. 
> The following xml properties are supported with this version of LineEntityProcessor:
> For fixed width files:
>  - colDef[#]
> For Delimited files:
>  - fieldDelimiterRegex
>  - firstLineHasFieldnames
>  - delimitedFieldNames
>  - delimitedFieldTypes
> These properties are described in the api documentation.  See patch.
> When combined with the cache improvements from SOLR-2382 this allows you to join a flat file entity with other entities (sql, etc).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2549) DIH LineEntityProcessor support for delimited & fixed-width files

Posted by "James Dyer (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Dyer updated SOLR-2549:
-----------------------------

    Attachment: SOLR-2549.patch

A long time ago someone on the users' list asked for better support for delimited files.  This version supports most of the same features as the CSVRequestHandler, using the same csv parser and most of the same parameter names.  

The reason for using DIH instead for CSVRequestHandler would be cases where the flat file needs to be joined to other entities, if the data needs to be cached, and/or if transformers need to be applied.

This patch also retains the same support for fixed-width files.

The unit tests have been enhanced to test these new possibilities.
                
> DIH LineEntityProcessor support for delimited & fixed-width files
> -----------------------------------------------------------------
>
>                 Key: SOLR-2549
>                 URL: https://issues.apache.org/jira/browse/SOLR-2549
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0
>            Reporter: James Dyer
>            Priority: Minor
>         Attachments: SOLR-2549.patch, SOLR-2549.patch, SOLR-2549.patch
>
>
> Provides support for Fixed Width and Delimited Files without needing to write a Transformer. 
> The following xml properties are supported with this version of LineEntityProcessor:
> For fixed width files:
>  - colDef[#]
> For Delimited files:
>  - fieldDelimiterRegex
>  - firstLineHasFieldnames
>  - delimitedFieldNames
>  - delimitedFieldTypes
> These properties are described in the api documentation.  See patch.
> When combined with the cache improvements from SOLR-2382 this allows you to join a flat file entity with other entities (sql, etc).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (SOLR-2549) DIH LineEntityProcessor support for delimited & fixed-width files

Posted by "Pulkit Singhal (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13125170#comment-13125170 ] 

Pulkit Singhal edited comment on SOLR-2549 at 10/11/11 10:54 PM:
-----------------------------------------------------------------

@jdyer Can you please post some data-config.xml samples in the comments?
The patch docs are good but this would be also very helpful. If you don't mind :)
                
      was (Author: pulkitsinghal@gmail.com):
    @jdyer Can you please post soms samples in the comments? The patch docs are good but this would be also very helpful. If you don't mind :)
                  
> DIH LineEntityProcessor support for delimited & fixed-width files
> -----------------------------------------------------------------
>
>                 Key: SOLR-2549
>                 URL: https://issues.apache.org/jira/browse/SOLR-2549
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0
>            Reporter: James Dyer
>            Priority: Minor
>         Attachments: SOLR-2549.patch, SOLR-2549.patch
>
>
> Provides support for Fixed Width and Delimited Files without needing to write a Transformer. 
> The following xml properties are supported with this version of LineEntityProcessor:
> For fixed width files:
>  - colDef[#]
> For Delimited files:
>  - fieldDelimiterRegex
>  - firstLineHasFieldnames
>  - delimitedFieldNames
>  - delimitedFieldTypes
> These properties are described in the api documentation.  See patch.
> When combined with the cache improvements from SOLR-2382 this allows you to join a flat file entity with other entities (sql, etc).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2549) DIH LineEntityProcessor support for delimited & fixed-width files

Posted by "zakaria benzidalmal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493871#comment-13493871 ] 

zakaria benzidalmal commented on SOLR-2549:
-------------------------------------------

thanks to james for his help
                
> DIH LineEntityProcessor support for delimited & fixed-width files
> -----------------------------------------------------------------
>
>                 Key: SOLR-2549
>                 URL: https://issues.apache.org/jira/browse/SOLR-2549
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0-ALPHA
>            Reporter: James Dyer
>            Priority: Minor
>         Attachments: SOLR-2549.patch, SOLR-2549.patch, SOLR-2549.patch, SOLR-2549.patch
>
>
> Provides support for Fixed Width and Delimited Files without needing to write a Transformer. 
> The following xml properties are supported with this version of LineEntityProcessor:
> For fixed width files:
>  - colDef[#]
> For Delimited files:
>  - fieldDelimiterRegex
>  - firstLineHasFieldnames
>  - delimitedFieldNames
>  - delimitedFieldTypes
> These properties are described in the api documentation.  See patch.
> When combined with the cache improvements from SOLR-2382 this allows you to join a flat file entity with other entities (sql, etc).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2549) DIH LineEntityProcessor support for delimited & fixed-width files

Posted by "James Dyer (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13168660#comment-13168660 ] 

James Dyer commented on SOLR-2549:
----------------------------------

The dependency here to SOLR-2943 is only for the "DIHCacheTypes" enum, which defines data types for each flat file column of data.  This is particularly helpful when joining to SQL data sources as DIH requires the join keys be the same type.  It might be beneficial to rename the enum to "DIHType" or something more generic, should either issue become a candidate for commit.
                
> DIH LineEntityProcessor support for delimited & fixed-width files
> -----------------------------------------------------------------
>
>                 Key: SOLR-2549
>                 URL: https://issues.apache.org/jira/browse/SOLR-2549
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0
>            Reporter: James Dyer
>            Priority: Minor
>         Attachments: SOLR-2549.patch, SOLR-2549.patch, SOLR-2549.patch
>
>
> Provides support for Fixed Width and Delimited Files without needing to write a Transformer. 
> The following xml properties are supported with this version of LineEntityProcessor:
> For fixed width files:
>  - colDef[#]
> For Delimited files:
>  - fieldDelimiterRegex
>  - firstLineHasFieldnames
>  - delimitedFieldNames
>  - delimitedFieldTypes
> These properties are described in the api documentation.  See patch.
> When combined with the cache improvements from SOLR-2382 this allows you to join a flat file entity with other entities (sql, etc).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2549) DIH LineEntityProcessor support for delimited & fixed-width files

Posted by "James Dyer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Dyer updated SOLR-2549:
-----------------------------

    Attachment: SOLR-2549.patch

This patch depends on the enum class "DIHCacheTypes.java" from SOLR-2382, included here for convenience.  Should this issue be considered for committing without SOLR-2382, the class could be renamed and included here by itself.  This is the only dependency on SOLR-2382.

This patch includes unit tests for Delimited & Fixed Width files.

> DIH LineEntityProcessor support for delimited & fixed-width files
> -----------------------------------------------------------------
>
>                 Key: SOLR-2549
>                 URL: https://issues.apache.org/jira/browse/SOLR-2549
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0
>            Reporter: James Dyer
>            Priority: Minor
>         Attachments: SOLR-2549.patch
>
>
> Provides support for Fixed Width and Delimited Files without needing to write a Transformer. 
> The following xml properties are supported with this version of LineEntityProcessor:
> For fixed width files:
>  - colDef[#]
> For Delimited files:
>  - fieldDelimiterRegex
>  - firstLineHasFieldnames
>  - delimitedFieldNames
>  - delimitedFieldTypes
> These properties are described in the api documentation.  See patch.
> When combined with the cache improvements from SOLR-2382 this allows you to join a flat file entity with other entities (sql, etc).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org