Neo4j
Neo4j
Neo4j is a graph database. We will use it to store CG (Call Graph) and CFG (Control Flow Graph) generated by ArkAnalyzer.
We will develop an extension called Ark2Graph for ArkAnalyzer, and steps are as follows:
- Get CG and CFG from ArkAnalyzer.
- Travel the CG and CFG.
- Save the results into Neo4j.
So it is essential to deploy a Neo4j server in the computer where we develop this extension.
1. Download and Run Neo4j Server
First, we can download an installer from here.
Before downloading, we need to fill the form about our personal information, and then this website will display an activation key. We need to copy this key and save it into a local file.
After downloading is OK, we can run this installer to install Neo4j. Choose where we want to install this application and where we want to save data, and then the server will run automatically. Then the window of Neo4j Desktop will be open, where we can view status of this server and manage this server.
Don't close the window of command opened automatically, or there will be some errors.
2. Connect to Neo4j Server
2.1 Set Username and Password
First, we open Neo4j Browser from Neo4j Desktop, and then run Cypher in the console:
ALTER USER neo4j SET PASSWORD 'neo4j123';
This command will change the password of user neo4j.
Or we can create a new user and grant admin privilege to it:
CREATE USER new_user SET PASSWORD 'new_password' CHANGE NOT REQUIRED;
GRANT ROLE admin TO new_user;
Then we can use this user and password to visit http://localhost:7474 using our browser or use terminal to connect to neo4j://localhost:7687
to connect to Neo4j server.
2.2 Connect to Neo4j Server in TS
First we need to install the driver of Neo4j:
npm install neo4j-driver
Then we need to edit the configuration of the Neo4j server:
# Use 0.0.0.0 to bind to all network interfaces on the machine
server.default_listen_address=0.0.0.0
Then we can create a TypeScript file to test this connection:
import neo4j from "neo4j-driver";
// You need to change it into where your Neo4j server is running
let uri = "bolt://172.18.48.1:7687";
let username = "neo4j";
let password = "neo4j123";
let driver = neo4j.driver(uri, neo4j.auth.basic(username, password));
let session = driver.session();
async function createNode() {
try {
const result = await session.run(
"CREATE (n:Person {name: $name, age: $age}) RETURN n",
{ name: "Neo4j", age: 20 }
);
console.log(result.records[0].get("n"));
} catch (error) {
console.error('Error creating node:', error);
} finally {
await session.close();
}
}
createNode().then(() => {
driver.close();
});
Run this file and then we can see the output:
Node {
identity: Integer { low: 169, high: 0 },
labels: [ 'Person' ],
properties: { name: 'Neo4j', age: 20 },
elementId: '4:e920355b-215c-4a27-aaab-8d40cf5dfa18:169'
}
3. Neo4j Modules
Neo4j graph database consists of following modules:
- Node
- Attribute
- Relationship
- Label
- Data Browser
3.1 Node
Node is the base unit of graph, which contains attributes with key-value.
EmployeeNode = {
EmpName: "Neo4j",
EmpNumber: "123456",
Salary: "3500",
};
At this example, the name of this node is EmployeeNode
.
3.2 Attribute
Attributes is described by properties, which is key-value
used to depict graph nodes and relationships.
key
is a string and value
can be any type of Neo4j.
3.3 Relationship
Relationships connect two nodes, and each relationship contains a starting node and an ending node.
Just like attributes, relationships can also contain key-value
properties.
3.4 Label
Label associates a public name with a group of nodes or relationships, and each node or relationship can contain one or more labels.
We can create a new label for nodes or relationships, and we can remove labels from nodes or relationships.
3.5 Data Browser
Data Browser is a GUI used to run cypher command and view the output, and we can use it to export the output.
4. Neo4j CQL (Cypher Query Language)
4.1 Neo4j Data Type
Data Type | Description |
---|---|
boolean | |
byte | int8 |
short | int16 |
int | int32 |
long | int64 |
float | float32 |
double | float64 |
char | utf-16 |
String |
4.2 CREATE
The CREATE
command is used to add new nodes, relationships or entire graph structures to the database. In addition, CREATE
can be combined with RETURN
to fetch the newly created data.
There are some examples about how to use CREATE
command.
// Create a Node
CREATE (n:Person {name: 'John', age: 30});
// Create Mutiple Nodes
CREATE (a:City {name: 'Paris'}), (b:City {name: 'New York'});
// Create Relations Between Nodes
CREATE (a:Person {name: 'John'})-[:FRIENDS_WITH]->(b:Person {name: 'Jane'});
// Create Multiple Relations
CREATE (a)-[:FRIENDS_WITH]->(b), (b)-[:FRIENDS_WITH]->(c);
// Create Complex Graph Structures
CREATE (a:Person {name: 'Alice'})-[:WORKS_AT]->(c:Company {name: 'TechCorp'})<-[:WORKS_AT]-(b:Person {name: 'Bob'});
// Return Created Data
CREATE (n:Person {name: 'John', age: 30}) RETURN n;
The usage of CREATE
command is as follows:
CREATE (<node-name>:<label-name> <properties>) [RETURN <node-name>];
// <properties> -> {<key-values>} | \epsilon
// <key-values> -> [<key-value-with-commas>] <key-value>
// <key-value-with-commas> -> <key-value-with-comma> [<key-value-with-commas>]
// <key-value-with-comma> -> <key-value> ,
// <key-value> -> <key>:<value>
4.3 MATCH
The MATCH
command is used to retrieve nodes, relationships or entire graph structures from a Neo4j database.
The basic usage of MATCH
command is as follows:
MATCH (<node-name>:<label-name>) RETURN <node-name>;
There are some examples about how to use MATCH
command:
// match a node with a specific label
MATCH (p:Person) RETURN p;
// match a node with specific properties
MATCH (p:Person {name: 'John'}) RETURN p;
// match relationships between nodes
MATCH (p:Person)-[:FRIENDS_WITH]->(q:Person) RETURN p, q;
// match patterns with both nodes and relationships
MATCH (a:Person)-[r:WORKS_AT]->(c:Company) RETURN a, r, c;
// match multiple nodes
MATCH (a:Person), (b:City) RETURN a, b;
// match nodes with multiple labels
MATCH (m:Movie:Drama) RETURN m;
// match nodes with conditional properties
MATCH (p:Person) WHERE p.age > 30 RETURN p;
// match nodes using wildcards
MATCH (p)-[r]->(q) RETURN p, r, q;
// match nodes without a relationship
MATCH (p:Person) WHERE NOT (p)-[:FRIENDS_WITH]->() RETURN p;
// match nodes using pattern variables
MATCH path = (p:Person)-[:FRIENDS_WITH*]->(q:Person) RETURN path;
4.4 DELETE and REMOVE
The DELETE
command is used to remove nodes and relationships from the graph.
There are some examples:
// delete a node
MATCH (n:Person {name: 'John'}) DELETE n;
// delete a relationship without deleting the nodes
MATCH (a:Person)-[r:FRIENDS_WITH]->(b:Person) DELETE r;
// delete a node with its relationships
MATCH (n:Person {name: 'John'})-[r]-() DELETE r, n;
The REMOVE
command is used to remove labels or properties from nodes or relationships.
There are some examples:
// remove a label from a node
MATCH (n:Person {name: 'John'}) REMOVE n:Person;
// remove multiple labels
MATCH (n {name: 'John'}) REMOVE n:Person:Employee;
// remove a property from a node
MATCH (n:Person {name: 'John'}) REMOVE n.age;
// remove a property from a relationship
MATCH (a:Person)-[r:FRIENDS_WITH]->(b:Person) REMOVE r.since;
A node cannot be deleted if it has relationships unless those relationships are removed first.
The node or relationship remains after the label or property is removed.
==DO NOT USE DELETE OR REMOVE WITHOUT MATCH==
4.5 SET
The SET
command is used to add or update properties or labels on nodes or relationships.
// update properties
MATCH (n:Person {name: 'John'}) SET n.age = 30;
MATCH (n:Person {name: 'John'}) SET n.age = 30, n.city = 'New York';
MATCH (a:Person)-[r:FRIENDS_WITH]->(b:Person) SET r.since = 2015;
// add labels
MATCH (n:Person {name: 'John'}) SET n:Employee;
MATCH (n:Person {name: 'John'}) SET n:Employee:Manager;
// remove properties
MATCH (n:Person {name: 'John'}) SET n.age = NULL;
MATCH (n:Person {name: 'John'}) SET n.age = NULL, n.city = NULL;
4.6 ORDER BY
The ORDER BY
clause is used to sort the results of a query based on one or more properties of nodes or relationships.
Syntax is as follows:
MATCH (n) RETURN n ORDER BY n.<property> [ASC|DESC]
ASC
(ascending) is the default sorting order.DESC
is used for descending order.
4.7 UNION
The UNION
clause is used to combine the results of two or more MATCH
or RETURN
queries. The key points to keep in mind when using UNION
are:
- The result sets from the queries being combined must have the same number of columns.
- The columns must have compatible data types.
- By default,
UNION
removes duplicate rows. If you want to include duplicates, useUNION ALL
.
Syntax is as follows:
<query1>
UNION [ALL]
<query2>
4.8 LIMIT and SKIP
LIMIT
and SKIP
are used to control the number of results returned and to paginate through results.
LIMIT
specifies the maximum number of rows to return.SKIP
specifies the number of rows to skip before starting to return results.
These clauses are typically used together for pagination.
Syntax is as follows:
MATCH (n) RETURN n ORDER BY n.<property> SKIP <number> LIMIT <number>;
4.9 MERGE
The MERGE
clause is used to ensure that a specific pattern (nodes, relationships, or both) exists in the graph. If the pattern does not exist, Cypher will create it. If the pattern already exists, it will return it without creating duplicates.
MERGE
is the combination of CREATE
and MATCH
.
There are some examples:
// merge a node
MERGE (p:Person {name: 'Alice'}) RETURN p;
MERGE (p:Person {name: 'Alice', age: 30}) RETURN p;
// merge a relationship
MATCH (p1:Person {name: 'Alice'}), (p2:Person {name: 'Bob'}) MERGE (p1)-[:FRIEND]->(p2) RETURN p1, p2;
// use `ON CREATE` and `ON MATCH`
MERGE (p:Person {name: 'Alice'})
ON CREATE SET p.createdAt = timestamp()
ON MATCH SET p.lastSeen = timestamp()
RETURN p;
4.10 IN
The IN
operator is used to check if a specific value exists within a list or collection.
Syntax is as follows:
MATCH (n:Label)
WHERE n.property IN [value1, value2, value3]
RETURN n;
Also, we can use IN
with a subquery:
MATCH (p:Person)
WITH COLLECT(p.name) AS names
MATCH (o:Order)
WHERE o.customerName IN names
RETURN o;
4.11 NULL
NULL
represents the absence of a value or the concept of "nothing."
Here are some key points regarding NULL
in Neo4j:
NULL
signifies that a property does not hold any value. For instance, if a node has a propertyage
and this property is not set, it will be consideredNULL
.NULL
is not considered a value of any data type (e.g., integer, string). Instead, it is a distinct state that indicates the lack of a value.- We can check for
NULL
using theIS NULL
orIS NOT NULL
conditions in Cypher queries. - If we try to access a property that does not exist on a node, it will return
NULL
. - Many aggregation functions in Cypher, such as
COUNT
, ignoreNULL
values.
5. id
Property
The id
property refers to a unique identifier for nodes and relationships within the database. This identifier is automatically assigned by Neo4j and serves as a way to uniquely identify each element in the graph.
Here are some key points regarding the id
property:
- When creating a new node or relationship, Neo4j automatically assigns a unique
id
to it. Thisid
is not user-defined and cannot be changed. - Each node and relationship in Neo4j has a unique
id
within its respective graph. This means that no two nodes or relationships will have the sameid
. - The
id
is an integer value, starting from0
for the first node or relationship created and increasing sequentially as new nodes or relationships are added. - We can access the
id
of a node or relationship in Cypher queries using theid()
function. - The
id
property is specific to the Neo4j database instance. If we export and import data, theid
values may change. - The
id
property is not meant to be used as a business key or reference in application logic. Instead, it is primarily for internal identification purposes. For application-level unique identification, consider defining our own unique properties.
6. Direction
Direction specifies the way in which relationships are connected between nodes, indicating a one-way connection from one node to another.
In Neo4j, all relationships are directed. This means that each relationship has a specific start node and end node, indicating a one-way connection.
Each relationship can have a type, which describes the nature of the connection. The direction is indicated by the arrow in the notation.
(Alice)-[:FRIENDS_WITH]->(Bob) (Bob)<-[:WORKS_WITH]-(Charlie)
While relationships are inherently directed, we can create a structure that allows for bidirectional navigation. This can be achieved by creating two relationships, one in each direction:
(A)-[:FRIENDS_WITH]->(B) (B)-[:FRIENDS_WITH]->(A)
If you want to find relationships without regard to direction, you can use the relationship pattern without arrows. For example:
MATCH (a:Person)-[:FRIENDS_WITH]-(b:Person) RETURN b;
Relationships are always directed and can be queried based on their direction. The direction concept is essential for accurately modeling and querying graph data.
7. CQL Function
7.1 String Function
Function | Description |
---|---|
UPPER | UPPER(<input-string>) |
LOWER | LOWER(<input-string>) |
SUBSTRING | SUBSTRING(<input-string>,<start-index>,<end-index>) |
REPLACE |
7.2 Aggregation Function
COUNT
MAX
MIN
SUM
AVG
7.3 Relationship Function
Function | Description |
---|---|
STARTNODE | find the start node of a relationship |
ENDNODE | find the end node of a relationship |
ID | find the id of a relationship |
TYPE | find the type of a relationship represented by string |
8. INDEX
Index in CQL is similar as that in SQL.
CREATE INDEX ON :<label-name> (<property-name>);
DROP INDEX ON :<label-name> (<property-name>);
9. UNIQUE
UNIQUE
can be added into nodes or relationships.
CREATE CONSTRAINT ON (<label-name>) ASSERT <property-name> IS UNIQUE;
DROP CONSTRAINT ON (<label-name>) ASSERT <property-name> IS UNIQUE;
10. Other Things
We will use Gremlin and G.V to access Neo4j in ArkAnalyzer project.
Also, in the future, we can use Neo4j with Spring framework in other projects.