You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@age.apache.org by "rafsun42 (via GitHub)" <gi...@apache.org> on 2023/06/15 19:51:41 UTC

[GitHub] [age] rafsun42 opened a new issue, #995: Research functions that extracts label ID

rafsun42 opened a new issue, #995:
URL: https://github.com/apache/age/issues/995

   Research the following two functions. These functions extract label ID from graphid. 
   
   - How can we stop that while keeping the function working as before? 
   - Would that be compatible with your solution? 
   - Give some examples of cypher queries that uses these functions in its query plan tree.
   
   1. `entity_exists` in `cypher_utils.c`
   2. `get_label_name` in `agtype.c`
   3. `filter_vertices_on_label_id` in `cypher_clause.c`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@age.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [age] CapnSpek commented on issue #995: Research functions that extracts label ID

Posted by "CapnSpek (via GitHub)" <gi...@apache.org>.
CapnSpek commented on issue #995:
URL: https://github.com/apache/age/issues/995#issuecomment-1602934480

   I would like to work on 2nd function


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@age.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [age] panosfol commented on issue #995: Research functions that extracts label ID

Posted by "panosfol (via GitHub)" <gi...@apache.org>.
panosfol commented on issue #995:
URL: https://github.com/apache/age/issues/995#issuecomment-1627313474

   So i researched the above 3 functions and here is what I found:
   - The `entitiy_exists()` function uses the macro `GET_LABEL_ID` which takes the bit shifted `label_id` and reverses the procedure to get the `label_id` . Here is the macro: `(((uint64)id) >> ENTRY_ID_BITS)`
   - The `get_label_name()` function calls the `get_graphid_label_id()` function that follows the above logic : 
     `return (int32)(((uint64)gid) >> ENTRY_ID_BITS);`
   - The `filter_vertices_on_label_id()` function creates a function call for the function `_extract_label_id`. That function calls the same function as `get_label_name()` to get the `label_id`.
   
   
   According to my solution all of these functions would stay the same but they function arguments would change to include the value from the new column that I suggested instead of the `id`. The value of the new column would hold the `label_id` after the bit shift so we can minimize the changes that we would need apply to the code. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@age.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [age] WendelLana commented on issue #995: Research functions that extracts label ID

Posted by "WendelLana (via GitHub)" <gi...@apache.org>.
WendelLana commented on issue #995:
URL: https://github.com/apache/age/issues/995#issuecomment-1638709422

   Please note that the `label_name` parameter in the `cypher_target_node` struct may not always contain a value. This situation arises, for example, when using the provided queries in my previous comment. The presence of the label in the `CREATE` clause of the query affects whether the `label_name` parameter will have a value in `create_vertex`, so in cases where the variable is declared in the `MATCH` clause and referenced again in the `CREATE` clause, the label cannot be added again, resulting in an empty field for the `label_name` parameter.
   
   @rafsun42 Therefore, I believe we would have to scan a junction table (with all vertices ID and label ID) or each duplicate of the label table (that only has the vertex ID, the `Label_hash`) to get the label table ID or label name. As presented, we could use a trimmed and indexed table by hash method to make this scan faster.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@age.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [age] rafsun42 commented on issue #995: Research functions that extracts label ID

Posted by "rafsun42 (via GitHub)" <gi...@apache.org>.
rafsun42 commented on issue #995:
URL: https://github.com/apache/age/issues/995#issuecomment-1625694976

   @WendelLana Very good. Can you find a way to make these functions work without extracting label ID?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@age.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Research functions that extracts label ID [age]

Posted by "rafsun42 (via GitHub)" <gi...@apache.org>.
rafsun42 closed issue #995: Research functions that extracts label ID
URL: https://github.com/apache/age/issues/995


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@age.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [age] Zainab-Saad commented on issue #995: Research functions that extracts label ID

Posted by "Zainab-Saad (via GitHub)" <gi...@apache.org>.
Zainab-Saad commented on issue #995:
URL: https://github.com/apache/age/issues/995#issuecomment-1593972068

   I would like to work on 1st function


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@age.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [age] WendelLana commented on issue #995: Research functions that extracts label ID

Posted by "WendelLana (via GitHub)" <gi...@apache.org>.
WendelLana commented on issue #995:
URL: https://github.com/apache/age/issues/995#issuecomment-1612223835

   I've been investigating these functions, and here are some queries that we can use for debugging:
   **1. entity_exists**
   This function checks if a variable has been deleted and returns true or false if an entity still exists.
   Query
   ```
   SELECT * FROM cypher('graph_name', $$
   MATCH (x:Developer) 
   CREATE (x)-[r:PARTICIPATES]->(p:Project) 
   RETURN x
   $$) AS (vertex agtype);
   ```
   
   **2. get_label_name**
   This function returns the label name, it's used in the AGE functions `startNode` and `endNode()`
   Query
   ```
   SELECT * FROM cypher('graph_name', $$
       MATCH (d:Developer)-[r]-(p:Project)
       RETURN startNode(r), endNode(r)
   $$) as (start_node agtype, end_node agtype);
   ```
   I'll look more into these functions and their query plan trees to think of solutions for label ID


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@age.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [age] WendelLana commented on issue #995: Research functions that extracts label ID

Posted by "WendelLana (via GitHub)" <gi...@apache.org>.
WendelLana commented on issue #995:
URL: https://github.com/apache/age/issues/995#issuecomment-1631865656

   As Panos pointed out, these functions use the `graphid` to retrieve the label ID. I believe your solution wouldn't work, because right now it extracts the label ID from the `graphid` and then it gets the specific vertex lable table (`schema.vertex_label`).
   If `graphid` becomes a simple sequential ID or just the entry ID, we could utilize a junction table that stores each vertex ID and the label OID. This way, when searching for the label ID with the vertex ID in these functions, we can scan this junction table and find its label OID. Then, it can scan the correct label table and retrieve all the information of the vertex.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@age.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [age] rafsun42 commented on issue #995: Research functions that extracts label ID

Posted by "rafsun42 (via GitHub)" <gi...@apache.org>.
rafsun42 commented on issue #995:
URL: https://github.com/apache/age/issues/995#issuecomment-1607604427

   @CapnSpek 
   
   > For example if there's a query
   > 
   > MATCH (n:second_label) RETURN n
   > 
   > Then it can simply look for vertices with ids starting from 844424930131969 + 281474976710656.
   
   Does this query's QPT (query plan tree, which you can find with EXPLAIN command) says it starts looking for vertices from the mentioned ID?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@age.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [age] rafsun42 commented on issue #995: Research functions that extracts label ID

Posted by "rafsun42 (via GitHub)" <gi...@apache.org>.
rafsun42 commented on issue #995:
URL: https://github.com/apache/age/issues/995#issuecomment-1638933420

   @WendelLana If we store `label_id` as a column in the vertex tables along with `id` and `properties` column, would it be possible to make `entity_exist` work without using the junction table, or `Label_hash`? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@age.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [age] rafsun42 commented on issue #995: Research functions that extracts label ID

Posted by "rafsun42 (via GitHub)" <gi...@apache.org>.
rafsun42 commented on issue #995:
URL: https://github.com/apache/age/issues/995#issuecomment-1593633913

   I will do 3.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@age.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [age] panosfol commented on issue #995: Research functions that extracts label ID

Posted by "panosfol (via GitHub)" <gi...@apache.org>.
panosfol commented on issue #995:
URL: https://github.com/apache/age/issues/995#issuecomment-1631587910

   @rafsun42 you are right I will look into it!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@age.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [age] CapnSpek commented on issue #995: Research functions that extracts label ID

Posted by "CapnSpek (via GitHub)" <gi...@apache.org>.
CapnSpek commented on issue #995:
URL: https://github.com/apache/age/issues/995#issuecomment-1605209822

   I think I have something really good here.
   Skipping the part of how I discovered it, (you can test it yourself)
   
   The first vertex created of any graph in any database with any name, any label, any number of properties and any values have this particular id: 844424930131969
   
   Now when you create a vertex with the same label again, the next vertex simply gets the +1 id. (SAME LABEL IS IMPORTANT).
   
   However, when you create a vertex with another label (the label name does not matter, neither do the properties), it will get exactly this id: 1125899906842625
   
   And the difference of the id of the first vertex with first label and id of the first vertex with second label is exactly: 281474976710656
   
   Now this is important.
   
   Again, next when you create a vertex with second label (any properties) it will simply get second label vertex 1 id + 1 id.
   
   When you create a vertex with a third label (new label), it will get exactly this id irrespective of the properties: 1407374883553281
   
   Again the difference of first vertex of third label and first vertex of second label is exactly: 281474976710656
   
   Again, if you create another vertex with third label it will get first vertex of third label + 1 id.
   
   If you create a vertex with a new label (fourth label), again, irrespective of properties it will get exactly this id: 1688849860263937
   
   And again the difference of ids of first vertex of fourth label and first vertex of third label is exactly: 281474976710656
   
   My point being, vertex ids are not exactly tied to labels in the sense that they are tied of label names. Instead, they are tied to which numberth created label it was.
   
   Ids of vertices with first label of any created graph start with: 844424930131969
   Then its a simple incrementing series for all next vertices.
   In case a vertex is deleted, its id is not reused, instead, the next created vertex gets the incremented last used id.
   
   Ids of vertices with second label of any created graph start with: 1125899906842625
   Then it simply increments for all next vertices.
   
   And the difference between any two labels (their first vertices) is always 281474976710656.
   
   It seems this is how AGE keeps track of which label to look into.
   For example if there's a query
   
   MATCH (n:second_label) RETURN n
   
   Then it can simply look for vertices with ids starting from 844424930131969 + 281474976710656.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@age.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [age] rafsun42 commented on issue #995: Research functions that extracts label ID

Posted by "rafsun42 (via GitHub)" <gi...@apache.org>.
rafsun42 commented on issue #995:
URL: https://github.com/apache/age/issues/995#issuecomment-1629853333

   **3. filter_vertices_on_label_id**
   
   This function is used internally by the following query:
   ```sql
   MATCH(:Person)-[:IN]->(t:Title) RETURN t
   ```
   The QPT is:
   ```
    Gather  (cost=735883.94..1181947.48 rows=104664 width=32)
      Workers Planned: 2
      ->  Parallel Hash Join  (cost=734883.94..1170481.08 rows=43610 width=32)
            Hash Cond: (_age_default_alias_0.end_id = t.id)
            ->  Parallel Seq Scan on "IN" _age_default_alias_0  (cost=0.00..284748.30 rows=43610 width=8)
                  Filter: ((_extract_label_id(start_id))::integer = 4)
            ->  Parallel Hash  (cost=533288.42..533288.42 rows=4145242 width=270)
                  ->  Parallel Seq Scan on "Title" t  (cost=0.00..533288.42 rows=4145242 width=270)
   ```
   The function adds filter condition in a query plan. In the above QPT, this line `Filter: ((_extract_label_id(start_id))::integer = 4)
   ` is built by it.
   
   Because person is filtered by only label (i.e. `(:Person)`) and no property filter or variable is used, internally the `Person` table is not joined with the `IN` table. The `_extract_label_id` can tell which label `start_id` belongs to, and eliminates the join. 
   
   In order to drop the concept of `graphid`, we will need to stop using the function `_extract_label_id`. One alternative is to actually to the join. Except, not with the `Person` table. A duplicate table of `Person` can be used. It can be trimmed to have only ID column and indexed strategically, to reduce the join time. This solution is discussed in detail in issue #1021.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@age.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [age] WendelLana commented on issue #995: Research functions that extracts label ID

Posted by "WendelLana (via GitHub)" <gi...@apache.org>.
WendelLana commented on issue #995:
URL: https://github.com/apache/age/issues/995#issuecomment-1630116157

   I've analyzed the query plan trees for the queries that I provied before, I'll write them here.
   
   In the `entity_exists` function query that returns false, we have the following query plan:
   ````
                                           QUERY PLAN
   -------------------------------------------------------------------------------------------
    Custom Scan (Cypher Create)  (cost=0.00..0.00 rows=0 width=32)
      ->  Subquery Scan on _  (cost=0.00..0.00 rows=1 width=32)
            ->  Result  (cost=0.00..0.00 rows=0 width=256)
                  ->  Custom Scan (Cypher Delete)  (cost=0.00..0.00 rows=0 width=32)
                        ->  Subquery Scan on __1  (cost=0.00..343.00 rows=1200 width=32)
                              ->  Seq Scan on "Dev" x  (cost=0.00..331.00 rows=1200 width=64)
   (6 rows)
   ````
   
   In the `entity_exists` function query that returns true, we have the following query plan:
   ````
                                      QUERY PLAN
   ---------------------------------------------------------------------------------
    Custom Scan (Cypher Create)  (cost=0.00..0.00 rows=0 width=32)
      ->  Subquery Scan on _  (cost=0.00..1549.00 rows=1200 width=32)
            ->  Seq Scan on "Developer" x  (cost=0.00..1537.00 rows=1200 width=224)
   (3 rows)
   ````
   
   In the `get_label_name` function query, we have the following query plan:
   ```` 
                                                       QUERY PLAN
   -------------------------------------------------------------------------------------------------------------------
    Gather  (cost=1000.00..37915106.89 rows=139750 width=64)
      Workers Planned: 2
      ->  Nested Loop  (cost=0.00..37900131.89 rows=58229 width=64)
            Join Filter: (((r.start_id = d.id) AND (r.end_id = p.id)) OR ((r.end_id = d.id) AND (r.start_id = p.id)))
            ->  Parallel Append  (cost=0.00..35.46 rows=809 width=56)
                  ->  Parallel Seq Scan on "PARTICIPATES" r_2  (cost=0.00..15.71 rows=571 width=56)
                  ->  Parallel Seq Scan on "FRIENDS_WITH" r_3  (cost=0.00..15.71 rows=571 width=56)
                  ->  Parallel Seq Scan on _ag_label_edge r_1  (cost=0.00..0.00 rows=1 width=56)
            ->  Nested Loop  (cost=0.00..18047.00 rows=1440000 width=16)
                  ->  Seq Scan on "Developer" d  (cost=0.00..22.00 rows=1200 width=8)
                  ->  Materialize  (cost=0.00..28.00 rows=1200 width=8)
                        ->  Seq Scan on "Project" p  (cost=0.00..22.00 rows=1200 width=8)
   (12 rows)
   ````
   
   As the QPT shows, they don't utilize the `_extract_label_id` function, and there's only a simple join filter to scan the edge table. I believe your solution would work just fine with all these functions.
   
   Additionally, I believe there is no need to duplicate the each label table, `Person` for example, because the column `properties` is already present in the `_ag_label_vertex` table. However, I'm still analyzing how much code we would need to change to ensure everything continues to work.
   
   My ideia is that the query plan would first scan the `Person` table with only the IDs, and then in some cases, scan `ag_label_vertex`, and in other cases, simply filter by the properties in `ag_label_vertex`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@age.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [age] rafsun42 commented on issue #995: Research functions that extracts label ID

Posted by "rafsun42 (via GitHub)" <gi...@apache.org>.
rafsun42 commented on issue #995:
URL: https://github.com/apache/age/issues/995#issuecomment-1631262698

   @WendelLana These functions dp extract label ID. But, they don't use a function that shows up in QPT (like `_extract_label_id`). Rather the functions being used for extraction are internal C function (not exported Postgres function). Interesting find. Good work.
   
   Could you look for the internal functions where label is extracted? You can take a hint from panos's comment above. Then, let us now if the solution I proposed can be adopted there. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@age.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [age] CapnSpek commented on issue #995: Research functions that extracts label ID

Posted by "CapnSpek (via GitHub)" <gi...@apache.org>.
CapnSpek commented on issue #995:
URL: https://github.com/apache/age/issues/995#issuecomment-1631941896

   As of currently, the ids carry 2 types of information.
   1.) The label
   2.) The sequence
   
   I believe irrespective of the type of query, `_extract_label_id` is used whenever a vertex is returned.
   
   I can prove this in the form of an example.
   
   Let's start by creating 2 vertices in the graph with 2 different labels.
   
   So the contents of _ag_label_vertex are
   ![image](https://github.com/apache/age/assets/109627745/07fd2f29-3b0a-4e5b-8716-0787b60a809c)
   
   Next, I'll update the label id of John from "1125899906842625" to "844424930131970".
   So the new contents are: -
   ![image](https://github.com/apache/age/assets/109627745/d0c6be5f-dfe4-4377-a80b-ff4c2b0f4f26)
   
   The contents of person_1 and person_2 respectively are: -
   
   ![image](https://github.com/apache/age/assets/109627745/bf85002f-0efe-4c27-8fa1-ff8457089567)
   
   
   Now, if we perform some simple `MATCH` queries
   Such as
   MATCH (n) RETURN n;
   MATCH (n:person_1) RETURN n;
   MATCH (n:person_2) RETURN n;
   
   You'll see even if we filter by person_2, the label actually is person_1
   
   ![image](https://github.com/apache/age/assets/109627745/964532c0-efef-4b2e-a8f5-b4f21413b612)
   
   Which implies, even in queries where I am filtering by specifying the label "person_2", the search is made on the person_2 table, but the label is decided by extracting the `label_id` through the id only.
   
   It is done by bitshifting the id towards right by (32+16) numbers (macro ENTRY_ID_BITS)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@age.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [age] panosfol commented on issue #995: Research functions that extracts label ID

Posted by "panosfol (via GitHub)" <gi...@apache.org>.
panosfol commented on issue #995:
URL: https://github.com/apache/age/issues/995#issuecomment-1593660790

   I would like to work on this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@age.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [age] panosfol commented on issue #995: Research functions that extracts label ID

Posted by "panosfol (via GitHub)" <gi...@apache.org>.
panosfol commented on issue #995:
URL: https://github.com/apache/age/issues/995#issuecomment-1637134723

   > **3. filter_vertices_on_label_id**
   > 
   > This function is used internally by the following query:
   > 
   > ```sql
   > MATCH (:Person)-[e:IN]->(:Title{primaryTitle:'Inception'}) RETURN e 
   > ```
   > 
   > The QPT is:
   > 
   > ```
   >  Gather  (cost=569028.74..853951.48 rows=105 width=32)
   >    Workers Planned: 2
   >    ->  Parallel Hash Join  (cost=568028.74..852940.98 rows=44 width=32)
   >          Hash Cond: (e.end_id = _age_default_alias_0.id)
   >          ->  Parallel Seq Scan on "IN" e  (cost=0.00..284748.30 rows=43610 width=29)
   >                Filter: ((_extract_label_id(start_id))::integer = 4)
   >          ->  Parallel Hash  (cost=567965.30..567965.30 rows=5075 width=8)
   >                ->  Parallel Seq Scan on "Title" _age_default_alias_0  (cost=0.00..567965.30 rows=5075 width=8)
   >                      Filter: (properties @> agtype_build_map('primaryTitle'::text, '"Inception"'::agtype))
   > ```
   > 
   > The function adds filter condition in a query plan. In the above QPT, this line `Filter: ((_extract_label_id(start_id))::integer = 4) ` is built by it.
   > 
   > Because person is filtered by only label (i.e. `(:Person)`) and no property filter or variable is used, internally the `Person` table is not joined with the `IN` table. The `_extract_label_id` can tell which label `start_id` belongs to, and eliminates the join.
   > 
   > In order to drop the concept of `graphid`, we will need to stop using the function `_extract_label_id`. One alternative is to actually to the join. Except, not with the `Person` table. A duplicate table of `Person` can be used. It can be trimmed to have only ID column and indexed strategically, to reduce the join time.
   > 
   > So, `Person` table will have a duplicate `Person_hash`. It will only have the ID column. Dropping the properties column will make the join faster since each disk read can load more rows now. The ID column, then, will be indexed by hash method, so a hash join can be performed. A combination of less data to load from disk and hash index, the join can be made faster than a regular join (which is performed from Title).
   > 
   > @panosfol @Zainab-Saad @WendelLana @CapnSpek What do you guys think of this solution? Can the other two functions that you researched adopt this solution?
   
   From my understanding this solution would need us know the `label_name` in order to find the correct duplicate table. Therefore I've researched in which of the above 3 functions (`entity_exists()`, `get_label_name()`, `filter_vertices_on_label_id()`) we have access to the `label_name` and in what context are they called.
   
   First, the `entity_exists()` function is called by `merge_vertex()` and `create_vertex()`, both of which have as argument `cypher_target_node` struct that has `char *label_name` as field, therefore we can actually access the `label_name` in the context that `entity_exists()` is being called.
   
   The `filter_vertices_on_label_id()` has `char *label` as argument, so we actually have the `label_name` in the context of the function. And finally the `get_label_name()` is being called by the `age_startnode()` and `age_endnode()` in the executor stage and it doesn't have a way to access the `label_name` without using the `graphid`. The problem is that the `age_startnode()` and `age_endnode()` are not internal C functions and their usage is to take an `edge` and return either the start node or the end node. The only way to accomplish that right now is through the node's `ID`, because that's the only information that the `edge` is holding for its 2 vertices.
   
   In order to remove completely the `graphid` utility we need to come up with a way specifically for the `edge` structure because the `get_label_name()` function is being called by the 2 functions (`age_startnode()`, `age_endnode()`) that have only an edge as argument. Or change/remove those 2 functions completely.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@age.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [age] WendelLana commented on issue #995: Research functions that extracts label ID

Posted by "WendelLana (via GitHub)" <gi...@apache.org>.
WendelLana commented on issue #995:
URL: https://github.com/apache/age/issues/995#issuecomment-1641092269

   @rafsun42 I don't believe so, in this context, we only have access to the user input. Therefore to get all things stored about the vertex it would need a scan.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@age.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [age] panosfol commented on issue #995: Research functions that extracts label ID

Posted by "panosfol (via GitHub)" <gi...@apache.org>.
panosfol commented on issue #995:
URL: https://github.com/apache/age/issues/995#issuecomment-1605611963

   > The first vertex created of any graph in any database with any name, any label, any number of properties and any values have this particular id: 844424930131969
   
   This isn't completely accurate.
   `Id`'s of vertices are tied to the `label id`. The first label table that is created in any graph is the `_ag_label_vertex` with `id = 1` and the second is `_ag_label_edge` with `id = 2`. 
   After applying a bit mask in the `label id` the vertex/edge `id` is generated and combined with a sequence you get the incremented `id`'s of the different vertices and edges.
   
    However your comment was a bit incorrect because the first vertex doesn't always have the `id` you provided. If you create a vertex with no label the resulting `id` will be `281474976710657` because thats the resulting number after applying the bit shift mask to the number `1` (since `1` is the label id for the vertices that have no labels).
   
   So basically `id`'s are entirely tied by the `label id`, and that is tied to the order of creating different labels, always starting at `3` for the first label created (since `1` and `2` are reserved when we create the graph by the default labels). After applying the mask you get the different `ids` of vertices and edges.
   I think the reason that the different `id`'s have that mathimatical property that you provided, is entirely based on the mask that we are using and it is not helpful by itself since it is just a side effect of the creating process.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@age.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [age] rafsun42 commented on issue #995: Research functions that extracts label ID

Posted by "rafsun42 (via GitHub)" <gi...@apache.org>.
rafsun42 commented on issue #995:
URL: https://github.com/apache/age/issues/995#issuecomment-1631575102

   > So i researched the above 3 functions and here is what I found:
   > 
   > * The `entitiy_exists()` function uses the macro `GET_LABEL_ID` which takes the bit shifted `label_id` and reverses the procedure to get the `label_id` . Here is the macro: `(((uint64)id) >> ENTRY_ID_BITS)`
   > * The `get_label_name()` function calls the `get_graphid_label_id()` function that follows the above logic :
   >   `return (int32)(((uint64)gid) >> ENTRY_ID_BITS);`
   > * The `filter_vertices_on_label_id()` function creates a function call for the function `_extract_label_id`. That function calls the same function as `get_label_name()` to get the `label_id`.
   > 
   > According to my solution all of these functions would stay the same but they function arguments would change to include the value from the new column that I suggested instead of the `id`. The value of the new column would hold the `label_id` after the bit shift so we can minimize the changes that we would need apply to the code.
   
   @panosfol 
   
   Could you elaborate more? Whichever function will pass the value from the new column to these three functions, how would it access the new column value data? Consider what data the caller function already have access to in its context.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@age.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [age] CapnSpek commented on issue #995: Research functions that extracts label ID

Posted by "CapnSpek (via GitHub)" <gi...@apache.org>.
CapnSpek commented on issue #995:
URL: https://github.com/apache/age/issues/995#issuecomment-1631945362

   Since as of currently, the ids carry 2 kinds of information: sequence, and label id
   
   The solution can be separating the two kinds of information.
   
   There can be an sequence_id, which is just for sequence
   
   And a label_id which is just for labels.
   
   The separation of information could be achieved by a new column, or through a junction table.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@age.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org