You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ambari.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/10/13 07:59:00 UTC
[jira] [Commented] (AMBARI-24761) Infra Manager: hive support for archiving Infra Solr

    [ https://issues.apache.org/jira/browse/AMBARI-24761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16648803#comment-16648803 ] 

ASF GitHub Bot commented on AMBARI-24761:
-----------------------------------------

kasakrisz opened a new pull request #6: AMBARI-24761 - Infra Manager: hive support for archiving Infra Solr
URL: https://github.com/apache/ambari-infra/pull/6
 
 
   ## What changes were proposed in this pull request?
   
   - When archiving documents stored in solr collections the output json file is compressed. Change output file compressor from tar.gz to bzip2 because Hive can not process tar.gz
   - When serializing Documents to json the integer type fields should be serialized as integers not as strings.
   Instead of 
   ```
   "line_number":"315",
   ```
   use
   ```
   "line_number":315,
   ```
   because these fields are declared as integers in the target Hive table and Hives's `org.apache.hive.hcatalog.data.JsonSerDe` serializer expects integers.
   - adjust UTs and ITs
   
   ## How was this patch tested?
   
   1. Run UTs and ITs
   2. Manually:
   - Deploy Ambari and a cluster including Infra Solr, Infra Manager, Logsearch, Ranger, Hive, Hdfs
   - Enable Ranger plugins
   - Create folders on HDFS to store exported data and set permissions to allow reading from the folders for Hive and write for Infra Manager 
   - Export data from Solr using Infra manager to HDFS
   - Create external tables in Hive for exported data (https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.2.2/bk_ambari-operations/content/amb_infra_arch_n_purge_command_line_operations.html)
   - Select data from the tables using Hive Query  
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Infra Manager: hive support for archiving Infra Solr
> ----------------------------------------------------
>
>                 Key: AMBARI-24761
>                 URL: https://issues.apache.org/jira/browse/AMBARI-24761
>             Project: Ambari
>          Issue Type: Bug
>          Components: infra
>    Affects Versions: 2.8.0
>            Reporter: Krisztian Kasa
>            Assignee: Krisztian Kasa
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.8.0
>
>
> When exporting Solr documents from logsearch and ranger collections save it to a format which can be parsed by Hive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)