Neo4j
Neo4j
Neo4j is a graph database. We will use it to store CG (Call Graph) and CFG (Control Flow Graph) generated by ArkAnalyzer.
We will develop an extension called Ark2Graph for ArkAnalyzer, and steps are as follows:
- Get CG and CFG from ArkAnalyzer.
- Travel the CG and CFG.
- Save the results into Neo4j.
So it is essential to deploy a Neo4j server in the computer where we develop this extension.
1. Download and Run Neo4j Server
First, we can download an installer from here.
Before downloading, we need to fill the form about our personal information, and then this website will display an activation key. We need to copy this key and save it into a local file.
After downloading is OK, we can run this installer to install Neo4j. Choose where we want to install this application and where we want to save data, and then the server will run automatically. Then the window of Neo4j Desktop will be open, where we can view status of this server and manage this server.
Don't close the window of command opened automatically, or there will be some errors.
2. Connect to Neo4j Server
2.1 Set Username and Password
First, we open Neo4j Browser from Neo4j Desktop, and then run Cypher in the console:
ALTER USER neo4j SET PASSWORD 'neo4j123';
This command will change the password of user neo4j.
Or we can create a new user and grant admin privilege to it:
CREATE USER new_user SET PASSWORD 'new_password' CHANGE NOT REQUIRED;
GRANT ROLE admin TO new_user;
Then we can use this user and password to visit http://localhost:7474 using our browser or use terminal to connect to neo4j://localhost:7687 to connect to Neo4j server.
2.2 Connect to Neo4j Server in TS
First we need to install the driver of Neo4j:
npm install neo4j-driver
Then we need to edit the configuration of the Neo4j server:
# Use 0.0.0.0 to bind to all network interfaces on the machine
server.default_listen_address=0.0.0.0
Then we can create a TypeScript file to test this connection:
import neo4j from "neo4j-driver";
// You need to change it into where your Neo4j server is running
let uri = "bolt://172.18.48.1:7687";
let username = "neo4j";
let password = "neo4j123";
let driver = neo4j.driver(uri, neo4j.auth.basic(username, password));
let session = driver.session();
async function createNode() {
try {
const result = await session.run(
"CREATE (n:Person {name: $name, age: $age}) RETURN n",
{ name: "Neo4j", age: 20 }
);
console.log(result.records[0].get("n"));
} catch (error) {
console.error('Error creating node:', error);
} finally {
await session.close();
}
}
createNode().then(() => {
driver.close();
});
Run this file and then we can see the output:
Node {
identity: Integer { low: 169, high: 0 },
labels: [ 'Person' ],
properties: { name: 'Neo4j', age: 20 },
elementId: '4:e920355b-215c-4a27-aaab-8d40cf5dfa18:169'
}
3. Neo4j Modules
Neo4j graph database consists of following modules:
- Node
- Attribute
- Relationship
- Label
- Data Browser
3.1 Node
Node is the base unit of graph, which contains attributes with key-value.
EmployeeNode = {
EmpName: "Neo4j",
EmpNumber: "123456",
Salary: "3500",
};
At this example, the name of this node is EmployeeNode .
3.2 Attribute
Attributes is described by properties, which is key-value used to depict graph nodes and relationships.
key is a string and value can be any type of Neo4j.
3.3 Relationship
Relationships connect two nodes, and each relationship contains a starting node and an ending node.
Just like attributes, relationships can also contain key-value properties.
3.4 Label
Label associates a public name with a group of nodes or relationships, and each node or relationship can contain one or more labels.
We can create a new label for nodes or relationships, and we can remove labels from nodes or relationships.
3.5 Data Browser
Data Browser is a GUI used to run cypher command and view the output, and we can use it to export the output.
4. Neo4j CQL (Cypher Query Language)
4.1 Neo4j Data Type
| Data Type | Description |
|---|---|
boolean | |
byte | int8 |
short | int16 |
int | int32 |
long | int64 |
float | float32 |
double | float64 |
char | utf-16 |
String |
4.2 CREATE
The CREATE command is used to add new nodes, relationships or entire graph structures to the database. In addition, CREATE can be combined with RETURN to fetch the newly created data.
There are some examples about how to use CREATE command.
// Create a Node
CREATE (n:Person {name: 'John', age: 30});
// Create Mutiple Nodes
CREATE (a:City {name: 'Paris'}), (b:City {name: 'New York'});
// Create Relations Between Nodes
CREATE (a:Person {name: 'John'})-[:FRIENDS_WITH]->(b:Person {name: 'Jane'});
// Create Multiple Relations
CREATE (a)-[:FRIENDS_WITH]->(b), (b)-[:FRIENDS_WITH]->(c);
// Create Complex Graph Structures
CREATE (a:Person {name: 'Alice'})-[:WORKS_AT]->(c:Company {name: 'TechCorp'})<-[:WORKS_AT]-(b:Person {name: 'Bob'});
// Return Created Data
CREATE (n:Person {name: 'John', age: 30}) RETURN n;
The usage of CREATE command is as follows:
CREATE (<node-name>:<label-name> <properties>) [RETURN <node-name>];
// <properties> -> {<key-values>} | \epsilon
// <key-values> -> [<key-value-with-commas>] <key-value>
// <key-value-with-commas> -> <key-value-with-comma> [<key-value-with-commas>]
// <key-value-with-comma> -> <key-value> ,
// <key-value> -> <key>:<value>
4.3 MATCH
The MATCH command is used to retrieve nodes, relationships or entire graph structures from a Neo4j database.
The basic usage of MATCH command is as follows:
MATCH (<node-name>:<label-name>) RETURN <node-name>;
There are some examples about how to use MATCH command:
// match a node with a specific label
MATCH (p:Person) RETURN p;
// match a node with specific properties
MATCH (p:Person {name: 'John'}) RETURN p;
// match relationships between nodes
MATCH (p:Person)-[:FRIENDS_WITH]->(q:Person) RETURN p, q;
// match patterns with both nodes and relationships
MATCH (a:Person)-[r:WORKS_AT]->(c:Company) RETURN a, r, c;
// match multiple nodes
MATCH (a:Person), (b:City) RETURN a, b;
// match nodes with multiple labels
MATCH (m:Movie:Drama) RETURN m;
// match nodes with conditional properties
MATCH (p:Person) WHERE p.age > 30 RETURN p;
// match nodes using wildcards
MATCH (p)-[r]->(q) RETURN p, r, q;
// match nodes without a relationship
MATCH (p:Person) WHERE NOT (p)-[:FRIENDS_WITH]->() RETURN p;
// match nodes using pattern variables
MATCH path = (p:Person)-[:FRIENDS_WITH*]->(q:Person) RETURN path;
4.4 DELETE and REMOVE
The DELETE command is used to remove nodes and relationships from the graph.
There are some examples:
// delete a node
MATCH (n:Person {name: 'John'}) DELETE n;
// delete a relationship without deleting the nodes
MATCH (a:Person)-[r:FRIENDS_WITH]->(b:Person) DELETE r;
// delete a node with its relationships
MATCH (n:Person {name: 'John'})-[r]-() DELETE r, n;
The REMOVE command is used to remove labels or properties from nodes or relationships.
There are some examples:
// remove a label from a node
MATCH (n:Person {name: 'John'}) REMOVE n:Person;
// remove multiple labels
MATCH (n {name: 'John'}) REMOVE n:Person:Employee;
// remove a property from a node
MATCH (n:Person {name: 'John'}) REMOVE n.age;
// remove a property from a relationship
MATCH (a:Person)-[r:FRIENDS_WITH]->(b:Person) REMOVE r.since;
A node cannot be deleted if it has relationships unless those relationships are removed first.
The node or relationship remains after the label or property is removed.
==DO NOT USE DELETE OR REMOVE WITHOUT MATCH==
4.5 SET
The SET command is used to add or update properties or labels on nodes or relationships.
// update properties
MATCH (n:Person {name: 'John'}) SET n.age = 30;
MATCH (n:Person {name: 'John'}) SET n.age = 30, n.city = 'New York';
MATCH (a:Person)-[r:FRIENDS_WITH]->(b:Person) SET r.since = 2015;
// add labels
MATCH (n:Person {name: 'John'}) SET n:Employee;
MATCH (n:Person {name: 'John'}) SET n:Employee:Manager;
// remove properties
MATCH (n:Person {name: 'John'}) SET n.age = NULL;
MATCH (n:Person {name: 'John'}) SET n.age = NULL, n.city = NULL;
4.6 ORDER BY
The ORDER BY clause is used to sort the results of a query based on one or more properties of nodes or relationships.
Syntax is as follows:
MATCH (n) RETURN n ORDER BY n.<property> [ASC|DESC]
ASC(ascending) is the default sorting order.DESCis used for descending order.
4.7 UNION
The UNION clause is used to combine the results of two or more MATCH or RETURN queries. The key points to keep in mind when using UNION are:
- The result sets from the queries being combined must have the same number of columns.
- The columns must have compatible data types.
- By default,
UNIONremoves duplicate rows. If you want to include duplicates, useUNION ALL.
Syntax is as follows:
<query1>
UNION [ALL]
<query2>
4.8 LIMIT and SKIP
LIMIT and SKIP are used to control the number of results returned and to paginate through results.
LIMITspecifies the maximum number of rows to return.SKIPspecifies the number of rows to skip before starting to return results.
These clauses are typically used together for pagination.
Syntax is as follows:
MATCH (n) RETURN n ORDER BY n.<property> SKIP <number> LIMIT <number>;
4.9 MERGE
The MERGE clause is used to ensure that a specific pattern (nodes, relationships, or both) exists in the graph. If the pattern does not exist, Cypher will create it. If the pattern already exists, it will return it without creating duplicates.
MERGE is the combination of CREATE and MATCH.
There are some examples:
// merge a node
MERGE (p:Person {name: 'Alice'}) RETURN p;
MERGE (p:Person {name: 'Alice', age: 30}) RETURN p;
// merge a relationship
MATCH (p1:Person {name: 'Alice'}), (p2:Person {name: 'Bob'}) MERGE (p1)-[:FRIEND]->(p2) RETURN p1, p2;
// use `ON CREATE` and `ON MATCH`
MERGE (p:Person {name: 'Alice'})
ON CREATE SET p.createdAt = timestamp()
ON MATCH SET p.lastSeen = timestamp()
RETURN p;
4.10 IN
The IN operator is used to check if a specific value exists within a list or collection.
Syntax is as follows:
MATCH (n:Label)
WHERE n.property IN [value1, value2, value3]
RETURN n;
Also, we can use IN with a subquery:
MATCH (p:Person)
WITH COLLECT(p.name) AS names
MATCH (o:Order)
WHERE o.customerName IN names
RETURN o;
4.11 NULL
NULL represents the absence of a value or the concept of "nothing."
Here are some key points regarding NULL in Neo4j:
NULLsignifies that a property does not hold any value. For instance, if a node has a propertyageand this property is not set, it will be consideredNULL.NULLis not considered a value of any data type (e.g., integer, string). Instead, it is a distinct state that indicates the lack of a value.- We can check for
NULLusing theIS NULLorIS NOT NULLconditions in Cypher queries. - If we try to access a property that does not exist on a node, it will return
NULL. - Many aggregation functions in Cypher, such as
COUNT, ignoreNULLvalues.
5. id Property
The id property refers to a unique identifier for nodes and relationships within the database. This identifier is automatically assigned by Neo4j and serves as a way to uniquely identify each element in the graph.
Here are some key points regarding the id property:
- When creating a new node or relationship, Neo4j automatically assigns a unique
idto it. Thisidis not user-defined and cannot be changed. - Each node and relationship in Neo4j has a unique
idwithin its respective graph. This means that no two nodes or relationships will have the sameid. - The
idis an integer value, starting from0for the first node or relationship created and increasing sequentially as new nodes or relationships are added. - We can access the
idof a node or relationship in Cypher queries using theid()function. - The
idproperty is specific to the Neo4j database instance. If we export and import data, theidvalues may change. - The
idproperty is not meant to be used as a business key or reference in application logic. Instead, it is primarily for internal identification purposes. For application-level unique identification, consider defining our own unique properties.
6. Direction
Direction specifies the way in which relationships are connected between nodes, indicating a one-way connection from one node to another.
In Neo4j, all relationships are directed. This means that each relationship has a specific start node and end node, indicating a one-way connection.
Each relationship can have a type, which describes the nature of the connection. The direction is indicated by the arrow in the notation.
(Alice)-[:FRIENDS_WITH]->(Bob) (Bob)<-[:WORKS_WITH]-(Charlie)While relationships are inherently directed, we can create a structure that allows for bidirectional navigation. This can be achieved by creating two relationships, one in each direction:
(A)-[:FRIENDS_WITH]->(B) (B)-[:FRIENDS_WITH]->(A)If you want to find relationships without regard to direction, you can use the relationship pattern without arrows. For example:
MATCH (a:Person)-[:FRIENDS_WITH]-(b:Person) RETURN b;
Relationships are always directed and can be queried based on their direction. The direction concept is essential for accurately modeling and querying graph data.
7. CQL Function
7.1 String Function
| Function | Description |
|---|---|
UPPER | UPPER(<input-string>) |
LOWER | LOWER(<input-string>) |
SUBSTRING | SUBSTRING(<input-string>,<start-index>,<end-index>) |
REPLACE |
7.2 Aggregation Function
COUNTMAXMINSUMAVG
7.3 Relationship Function
| Function | Description |
|---|---|
STARTNODE | find the start node of a relationship |
ENDNODE | find the end node of a relationship |
ID | find the id of a relationship |
TYPE | find the type of a relationship represented by string |
8. INDEX
Index in CQL is similar as that in SQL.
CREATE INDEX ON :<label-name> (<property-name>);
DROP INDEX ON :<label-name> (<property-name>);
9. UNIQUE
UNIQUE can be added into nodes or relationships.
CREATE CONSTRAINT ON (<label-name>) ASSERT <property-name> IS UNIQUE;
DROP CONSTRAINT ON (<label-name>) ASSERT <property-name> IS UNIQUE;
10. Other Things
We will use Gremlin and G.V to access Neo4j in ArkAnalyzer project.
Also, in the future, we can use Neo4j with Spring framework in other projects.
