You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@age.apache.org by "Amr-Shams (via GitHub)" <gi...@apache.org> on 2023/06/07 08:20:21 UTC

[GitHub] [age] Amr-Shams commented on issue #971: Is there any Good documentation how age load works

Amr-Shams commented on issue #971:
URL: https://github.com/apache/age/issues/971#issuecomment-1580180448

   parsing the data from the CSV file goes in a 
   
   
   ## 1. CSV file structure:
   
   ### First row: Header row describing the content of each column.
   ### Subsequent rows: Edge data with the following fields:
     1. Start node ID (integer)
     2. Start node label (string)
     3. End node ID (integer)
     4. End node label (string)
     5. Additional properties (optional)
   
   here is the detailed info about each function used
   **csv_edge_reader struct**: This structure holds the state of the CSV parser, including the fields, header, graph name, graph ID, object name, object ID, and other related information.
   
   **edge_field_cb()**: This is a callback function called for each field in the CSV file. It stores the field in the csv_edge_reader struct, reallocating memory as needed.
   
   **edge_row_cb()**: This is a callback function called for each row in the CSV file. If the row is the first row (header row), it stores the header information. For other rows, it processes the fields to extract start and end nodes, properties, and other edge-related information. It then calls insert_edge_simple() to insert the edge into the graph.
   
   **is_space() and is_term()**: These are utility functions used to customize the CSV parser's behavior when detecting space and line terminator characters, respectively.
   
   **create_edges_from_csv_file()**: This is the main function responsible for reading the CSV file and processing it using the CSV parser. It initializes the parser, reads the file in chunks, calls the appropriate callbacks (edge_field_cb() and edge_row_cb()), and cleans up memory after processing is comp
   
   
   I have read the code and I found this might be useful 
   the CSV file should follow these constraints:
   
   **Header row**: The CSV file must have a header row that describes the content of each column. The code assumes that the first row of the file is the header row.
   
   **Field values**: The start and end node IDs must be integers. The start and end node labels should be strings.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@age.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org