Neo4j

Yunge HuArkAnalyzer

Neo4j

Neo4j is a graph database. We will use it to store CG (Call Graph) and CFG (Control Flow Graph) generated by ArkAnalyzer.

We will develop an extension called Ark2Graph for ArkAnalyzeropen in new window, and steps are as follows:

  1. Get CG and CFG from ArkAnalyzer.
  2. Travel the CG and CFG.
  3. Save the results into Neo4j.

So it is essential to deploy a Neo4j server in the computer where we develop this extension.

1. Download and Run Neo4j Server

First, we can download an installer from hereopen in new window.

Before downloading, we need to fill the form about our personal information, and then this website will display an activation key. We need to copy this key and save it into a local file.

After downloading is OK, we can run this installer to install Neo4j. Choose where we want to install this application and where we want to save data, and then the server will run automatically. Then the window of Neo4j Desktop will be open, where we can view status of this server and manage this server.

Don't close the window of command opened automatically, or there will be some errors.

2. Connect to Neo4j Server

2.1 Set Username and Password

First, we open Neo4j Browser from Neo4j Desktop, and then run Cypher in the console:

ALTER USER neo4j SET PASSWORD 'neo4j123';

This command will change the password of user neo4j.

Or we can create a new user and grant admin privilege to it:

CREATE USER new_user SET PASSWORD 'new_password' CHANGE NOT REQUIRED;
GRANT ROLE admin TO new_user;

Then we can use this user and password to visit http://localhost:7474 using our browser or use terminal to connect to neo4j://localhost:7687 to connect to Neo4j server.

2.2 Connect to Neo4j Server in TS

First we need to install the driver of Neo4j:

npm install neo4j-driver

Then we need to edit the configuration of the Neo4j server:

# Use 0.0.0.0 to bind to all network interfaces on the machine
server.default_listen_address=0.0.0.0

Then we can create a TypeScript file to test this connection:

import neo4j from "neo4j-driver";

// You need to change it into where your Neo4j server is running
let uri = "bolt://172.18.48.1:7687";
let username = "neo4j";
let password = "neo4j123";

let driver = neo4j.driver(uri, neo4j.auth.basic(username, password));
let session = driver.session();

async function createNode() {
    try {
        const result = await session.run(
            "CREATE (n:Person {name: $name, age: $age}) RETURN n",
            { name: "Neo4j", age: 20 }
        );
        console.log(result.records[0].get("n"));
    } catch (error) {
        console.error('Error creating node:', error);
    } finally {
        await session.close();
    }
}

createNode().then(() => {
    driver.close();
});

Run this file and then we can see the output:

Node {
  identity: Integer { low: 169, high: 0 },
  labels: [ 'Person' ],
  properties: { name: 'Neo4j', age: 20 },
  elementId: '4:e920355b-215c-4a27-aaab-8d40cf5dfa18:169'
}

3. Neo4j Modules

Neo4j graph database consists of following modules:

  • Node
  • Attribute
  • Relationship
  • Label
  • Data Browser

3.1 Node

Node is the base unit of graph, which contains attributes with key-value.

EmployeeNode = {
    EmpName: "Neo4j",
    EmpNumber: "123456",
    Salary: "3500",
};

At this example, the name of this node is EmployeeNode .

3.2 Attribute

Attributes is described by properties, which is key-value used to depict graph nodes and relationships.

key is a string and value can be any type of Neo4j.

3.3 Relationship

Relationships connect two nodes, and each relationship contains a starting node and an ending node.

Just like attributes, relationships can also contain key-value properties.

3.4 Label

Label associates a public name with a group of nodes or relationships, and each node or relationship can contain one or more labels.

We can create a new label for nodes or relationships, and we can remove labels from nodes or relationships.

3.5 Data Browser

Data Browser is a GUI used to run cypher command and view the output, and we can use it to export the output.

4. Neo4j CQL (Cypher Query Language)

4.1 Neo4j Data Type

Data TypeDescription
boolean
byteint8
shortint16
intint32
longint64
floatfloat32
doublefloat64
charutf-16
String

4.2 CREATE

The CREATE command is used to add new nodes, relationships or entire graph structures to the database. In addition, CREATE can be combined with RETURN to fetch the newly created data.

There are some examples about how to use CREATE command.

// Create a Node
CREATE (n:Person {name: 'John', age: 30});
// Create Mutiple Nodes
CREATE (a:City {name: 'Paris'}), (b:City {name: 'New York'});
// Create Relations Between Nodes
CREATE (a:Person {name: 'John'})-[:FRIENDS_WITH]->(b:Person {name: 'Jane'});
// Create Multiple Relations
CREATE (a)-[:FRIENDS_WITH]->(b), (b)-[:FRIENDS_WITH]->(c);
// Create Complex Graph Structures
CREATE (a:Person {name: 'Alice'})-[:WORKS_AT]->(c:Company {name: 'TechCorp'})<-[:WORKS_AT]-(b:Person {name: 'Bob'});
// Return Created Data
CREATE (n:Person {name: 'John', age: 30}) RETURN n;

The usage of CREATE command is as follows:

CREATE (<node-name>:<label-name> <properties>) [RETURN <node-name>];
// <properties> -> {<key-values>} | \epsilon
// <key-values> -> [<key-value-with-commas>] <key-value>
// <key-value-with-commas> -> <key-value-with-comma> [<key-value-with-commas>]
// <key-value-with-comma> -> <key-value> ,
// <key-value> -> <key>:<value>

4.3 MATCH

The MATCH command is used to retrieve nodes, relationships or entire graph structures from a Neo4j database.

The basic usage of MATCH command is as follows:

MATCH (<node-name>:<label-name>) RETURN <node-name>;

There are some examples about how to use MATCH command:

// match a node with a specific label
MATCH (p:Person) RETURN p;
// match a node with specific properties
MATCH (p:Person {name: 'John'}) RETURN p;
// match relationships between nodes
MATCH (p:Person)-[:FRIENDS_WITH]->(q:Person) RETURN p, q;
// match patterns with both nodes and relationships
MATCH (a:Person)-[r:WORKS_AT]->(c:Company) RETURN a, r, c;
// match multiple nodes
MATCH (a:Person), (b:City) RETURN a, b;
// match nodes with multiple labels
MATCH (m:Movie:Drama) RETURN m;
// match nodes with conditional properties
MATCH (p:Person) WHERE p.age > 30 RETURN p;
// match nodes using wildcards
MATCH (p)-[r]->(q) RETURN p, r, q;
// match nodes without a relationship
MATCH (p:Person) WHERE NOT (p)-[:FRIENDS_WITH]->() RETURN p;
// match nodes using pattern variables
MATCH path = (p:Person)-[:FRIENDS_WITH*]->(q:Person) RETURN path;

4.4 DELETE and REMOVE

The DELETE command is used to remove nodes and relationships from the graph.

There are some examples:

// delete a node
MATCH (n:Person {name: 'John'}) DELETE n;
// delete a relationship without deleting the nodes
MATCH (a:Person)-[r:FRIENDS_WITH]->(b:Person) DELETE r;
// delete a node with its relationships
MATCH (n:Person {name: 'John'})-[r]-() DELETE r, n;

The REMOVE command is used to remove labels or properties from nodes or relationships.

There are some examples:

// remove a label from a node
MATCH (n:Person {name: 'John'}) REMOVE n:Person;
// remove multiple labels
MATCH (n {name: 'John'}) REMOVE n:Person:Employee;
// remove a property from a node
MATCH (n:Person {name: 'John'}) REMOVE n.age;
// remove a property from a relationship
MATCH (a:Person)-[r:FRIENDS_WITH]->(b:Person) REMOVE r.since;

A node cannot be deleted if it has relationships unless those relationships are removed first.

The node or relationship remains after the label or property is removed.

==DO NOT USE DELETE OR REMOVE WITHOUT MATCH==

4.5 SET

The SET command is used to add or update properties or labels on nodes or relationships.

// update properties
MATCH (n:Person {name: 'John'}) SET n.age = 30;
MATCH (n:Person {name: 'John'}) SET n.age = 30, n.city = 'New York';
MATCH (a:Person)-[r:FRIENDS_WITH]->(b:Person) SET r.since = 2015;
// add labels
MATCH (n:Person {name: 'John'}) SET n:Employee;
MATCH (n:Person {name: 'John'}) SET n:Employee:Manager;
// remove properties
MATCH (n:Person {name: 'John'}) SET n.age = NULL;
MATCH (n:Person {name: 'John'}) SET n.age = NULL, n.city = NULL;

4.6 ORDER BY

The ORDER BY clause is used to sort the results of a query based on one or more properties of nodes or relationships.

Syntax is as follows:

MATCH (n) RETURN n ORDER BY n.<property> [ASC|DESC]
  • ASC (ascending) is the default sorting order.
  • DESC is used for descending order.

4.7 UNION

The UNION clause is used to combine the results of two or more MATCH or RETURN queries. The key points to keep in mind when using UNION are:

  • The result sets from the queries being combined must have the same number of columns.
  • The columns must have compatible data types.
  • By default, UNION removes duplicate rows. If you want to include duplicates, use UNION ALL.

Syntax is as follows:

<query1>
UNION [ALL]
<query2>

4.8 LIMIT and SKIP

LIMIT and SKIP are used to control the number of results returned and to paginate through results.

  • LIMIT specifies the maximum number of rows to return.
  • SKIP specifies the number of rows to skip before starting to return results.

These clauses are typically used together for pagination.

Syntax is as follows:

MATCH (n) RETURN n ORDER BY n.<property> SKIP <number> LIMIT <number>;

4.9 MERGE

The MERGE clause is used to ensure that a specific pattern (nodes, relationships, or both) exists in the graph. If the pattern does not exist, Cypher will create it. If the pattern already exists, it will return it without creating duplicates.

MERGE is the combination of CREATE and MATCH.

There are some examples:

// merge a node
MERGE (p:Person {name: 'Alice'}) RETURN p;
MERGE (p:Person {name: 'Alice', age: 30}) RETURN p;
// merge a relationship
MATCH (p1:Person {name: 'Alice'}), (p2:Person {name: 'Bob'}) MERGE (p1)-[:FRIEND]->(p2) RETURN p1, p2;
// use `ON CREATE` and `ON MATCH`
MERGE (p:Person {name: 'Alice'})
ON CREATE SET p.createdAt = timestamp()
ON MATCH SET p.lastSeen = timestamp()
RETURN p;

4.10 IN

The IN operator is used to check if a specific value exists within a list or collection.

Syntax is as follows:

MATCH (n:Label)
WHERE n.property IN [value1, value2, value3]
RETURN n;

Also, we can use IN with a subquery:

MATCH (p:Person)
WITH COLLECT(p.name) AS names
MATCH (o:Order)
WHERE o.customerName IN names
RETURN o;

4.11 NULL

NULL represents the absence of a value or the concept of "nothing."

Here are some key points regarding NULL in Neo4j:

  • NULL signifies that a property does not hold any value. For instance, if a node has a property age and this property is not set, it will be considered NULL.
  • NULL is not considered a value of any data type (e.g., integer, string). Instead, it is a distinct state that indicates the lack of a value.
  • We can check for NULL using the IS NULL or IS NOT NULL conditions in Cypher queries.
  • If we try to access a property that does not exist on a node, it will return NULL.
  • Many aggregation functions in Cypher, such as COUNT, ignore NULL values.

5. id Property

The id property refers to a unique identifier for nodes and relationships within the database. This identifier is automatically assigned by Neo4j and serves as a way to uniquely identify each element in the graph.

Here are some key points regarding the id property:

  • When creating a new node or relationship, Neo4j automatically assigns a unique id to it. This id is not user-defined and cannot be changed.
  • Each node and relationship in Neo4j has a unique id within its respective graph. This means that no two nodes or relationships will have the same id.
  • The id is an integer value, starting from 0 for the first node or relationship created and increasing sequentially as new nodes or relationships are added.
  • We can access the id of a node or relationship in Cypher queries using the id() function.
  • The id property is specific to the Neo4j database instance. If we export and import data, the id values may change.
  • The id property is not meant to be used as a business key or reference in application logic. Instead, it is primarily for internal identification purposes. For application-level unique identification, consider defining our own unique properties.

6. Direction

Direction specifies the way in which relationships are connected between nodes, indicating a one-way connection from one node to another.

  • In Neo4j, all relationships are directed. This means that each relationship has a specific start node and end node, indicating a one-way connection.

  • Each relationship can have a type, which describes the nature of the connection. The direction is indicated by the arrow in the notation.

    (Alice)-[:FRIENDS_WITH]->(Bob)
    (Bob)<-[:WORKS_WITH]-(Charlie)
    
  • While relationships are inherently directed, we can create a structure that allows for bidirectional navigation. This can be achieved by creating two relationships, one in each direction:

    (A)-[:FRIENDS_WITH]->(B)
    (B)-[:FRIENDS_WITH]->(A)
    
  • If you want to find relationships without regard to direction, you can use the relationship pattern without arrows. For example:

    MATCH (a:Person)-[:FRIENDS_WITH]-(b:Person)
    RETURN b;
    

Relationships are always directed and can be queried based on their direction. The direction concept is essential for accurately modeling and querying graph data.

7. CQL Function

7.1 String Function

FunctionDescription
UPPERUPPER(<input-string>)
LOWERLOWER(<input-string>)
SUBSTRINGSUBSTRING(<input-string>,<start-index>,<end-index>)
REPLACE

7.2 Aggregation Function

  • COUNT
  • MAX
  • MIN
  • SUM
  • AVG

7.3 Relationship Function

FunctionDescription
STARTNODEfind the start node of a relationship
ENDNODEfind the end node of a relationship
IDfind the id of a relationship
TYPEfind the type of a relationship represented by string

8. INDEX

Index in CQL is similar as that in SQL.

CREATE INDEX ON :<label-name> (<property-name>);
DROP INDEX ON :<label-name> (<property-name>);

9. UNIQUE

UNIQUE can be added into nodes or relationships.

CREATE CONSTRAINT ON (<label-name>) ASSERT <property-name> IS UNIQUE;
DROP CONSTRAINT ON (<label-name>) ASSERT <property-name> IS UNIQUE;

10. Other Things

We will use Gremlinopen in new window and G.Vopen in new window to access Neo4j in ArkAnalyzer project.

Also, in the future, we can use Neo4j with Spring framework in other projects.

Last Updated 2024/9/28 23:03:16