Skip to main content

Neo4j

Neo4j

Neo4j is a native graph database designed to store and navigate relationships efficiently, perfect for connected data and network analysis.

Overview

  • Versions: 5.26, 5.25, 5.24
  • Cluster Support: ❌ No (Single node only)
  • Use Cases: Graph databases, relationships, networks, recommendation engines
  • Features: Cypher queries, APOC procedures, graph algorithms

Key Features

  • Native Graph Storage: Optimized for storing and querying connected data
  • Cypher Query Language: Expressive, SQL-like query language for graphs
  • ACID Transactions: Full transaction support for data integrity
  • Graph Algorithms: Built-in algorithms for path finding, centrality, community detection
  • APOC Library: Awesome Procedures On Cypher - extensive utility functions
  • Index-Free Adjacency: Traverse relationships without index lookups
  • Schema Flexibility: Optional schema with constraints and indexes
  • Full-Text Search: Built-in full-text indexing capabilities

Resource Tiers

TierCPUMemoryDiskBest For
Small0.51GB10GBDevelopment, testing
Medium12GB25GBSmall production apps
Large24GB50GBProduction workloads
XLarge48GB100GBComplex graph queries

Creating a Neo4j Add-on

  1. Navigate to Add-onsCreate Add-on
  2. Select Neo4j as the type
  3. Choose a version (5.26, 5.25, or 5.24)
  4. Configure:
    • Label: Descriptive name (e.g., "Knowledge Graph")
    • Description: Purpose and notes
    • Environment: Development or Production
    • Resource Tier: Based on your workload requirements
  5. Configure backups:
    • Schedule: Daily recommended for production
    • Retention: 7+ days for production
  6. Click Create Add-on

Connection Information

After deployment, connection details are available in the add-on details page and automatically injected into your apps via STRONGLY_SERVICES.

Connection String Format

bolt://username:password@host:7687
neo4j://username:password@host:7687

Accessing Connection Details

import os
import json
from neo4j import GraphDatabase

# Parse STRONGLY_SERVICES
services = json.loads(os.environ.get('STRONGLY_SERVICES', '{}'))

# Get Neo4j add-on connection
neo4j_addon = services['addons']['addon-id']

# Connect using bolt protocol
driver = GraphDatabase.driver(
neo4j_addon['connectionString'],
auth=(neo4j_addon['username'], neo4j_addon['password'])
)

# Or connect using individual parameters
driver = GraphDatabase.driver(
f"bolt://{neo4j_addon['host']}:{neo4j_addon['port']}",
auth=(neo4j_addon['username'], neo4j_addon['password'])
)

# Run a query
with driver.session() as session:
result = session.run(
"MATCH (p:Person) WHERE p.name = $name RETURN p",
name="Alice"
)
for record in result:
print(record["p"])

driver.close()

Cypher Query Language

Cypher is Neo4j's declarative query language for working with graph data.

Basic Syntax

// Create nodes
CREATE (p:Person {name: 'Alice', age: 30})
CREATE (c:Company {name: 'Acme Corp'})

// Create relationship
MATCH (p:Person {name: 'Alice'})
MATCH (c:Company {name: 'Acme Corp'})
CREATE (p)-[:WORKS_FOR {since: 2020}]->(c)

// Or create everything at once
CREATE (p:Person {name: 'Bob', age: 25})-[:WORKS_FOR {since: 2021}]->(c:Company {name: 'Tech Inc'})

// Match pattern
MATCH (p:Person)-[:WORKS_FOR]->(c:Company)
RETURN p.name, c.name

// Match with where clause
MATCH (p:Person)
WHERE p.age > 25
RETURN p.name, p.age

// Or inline where
MATCH (p:Person {age: 30})
RETURN p

Common Operations

Creating Nodes and Relationships

// Create multiple nodes
CREATE (alice:Person {name: 'Alice', email: 'alice@example.com'})
CREATE (bob:Person {name: 'Bob', email: 'bob@example.com'})
CREATE (python:Skill {name: 'Python', category: 'Programming'})
CREATE (ml:Skill {name: 'Machine Learning', category: 'AI'})

// Create relationships
MATCH (p:Person {name: 'Alice'})
MATCH (s:Skill {name: 'Python'})
CREATE (p)-[:HAS_SKILL {level: 'Expert', years: 5}]->(s)

// Create if not exists (MERGE)
MERGE (p:Person {email: 'charlie@example.com'})
ON CREATE SET p.name = 'Charlie', p.created = timestamp()
ON MATCH SET p.lastSeen = timestamp()

Querying Patterns

// Find direct relationships
MATCH (p:Person)-[:WORKS_FOR]->(c:Company)
RETURN p.name AS employee, c.name AS company

// Find paths with multiple hops
MATCH (p:Person)-[:WORKS_FOR]->(c:Company)-[:LOCATED_IN]->(city:City)
RETURN p.name, c.name, city.name

// Variable length paths
MATCH (p:Person)-[:KNOWS*1..3]->(friend:Person)
WHERE p.name = 'Alice'
RETURN DISTINCT friend.name

// Shortest path
MATCH path = shortestPath(
(alice:Person {name: 'Alice'})-[:KNOWS*]-(bob:Person {name: 'Bob'})
)
RETURN path

// All paths
MATCH path = (alice:Person {name: 'Alice'})-[:KNOWS*..4]-(bob:Person {name: 'Bob'})
RETURN path

Filtering and Conditions

// WHERE clause
MATCH (p:Person)
WHERE p.age >= 25 AND p.age <= 40
RETURN p.name, p.age

// String matching
MATCH (p:Person)
WHERE p.name STARTS WITH 'A'
RETURN p.name

MATCH (p:Person)
WHERE p.email CONTAINS '@example.com'
RETURN p.name, p.email

// Pattern matching in WHERE
MATCH (p:Person)
WHERE (p)-[:WORKS_FOR]->(:Company {name: 'Acme Corp'})
RETURN p.name

// NOT pattern
MATCH (p:Person)
WHERE NOT (p)-[:WORKS_FOR]->(:Company)
RETURN p.name AS freelancers

// IN operator
MATCH (p:Person)
WHERE p.name IN ['Alice', 'Bob', 'Charlie']
RETURN p

Updating Data

// Update properties
MATCH (p:Person {name: 'Alice'})
SET p.age = 31, p.lastUpdated = timestamp()

// Add label
MATCH (p:Person {name: 'Alice'})
SET p:Manager

// Remove property
MATCH (p:Person {name: 'Alice'})
REMOVE p.temporaryFlag

// Remove label
MATCH (p:Person {name: 'Alice'})
REMOVE p:Manager

Deleting Data

// Delete node (must delete relationships first)
MATCH (p:Person {name: 'Alice'})-[r]-()
DELETE r, p

// Or use DETACH DELETE
MATCH (p:Person {name: 'Alice'})
DETACH DELETE p

// Delete relationship
MATCH (p:Person {name: 'Alice'})-[r:WORKS_FOR]->()
DELETE r

// Delete all (careful!)
MATCH (n)
DETACH DELETE n

Aggregation

// Count
MATCH (p:Person)
RETURN count(p) AS totalPeople

// Group by and aggregate
MATCH (p:Person)-[:WORKS_FOR]->(c:Company)
RETURN c.name, count(p) AS employeeCount
ORDER BY employeeCount DESC

// Multiple aggregations
MATCH (p:Person)
RETURN
count(p) AS total,
avg(p.age) AS averageAge,
min(p.age) AS youngest,
max(p.age) AS oldest

// Collect into list
MATCH (c:Company)<-[:WORKS_FOR]-(p:Person)
RETURN c.name, collect(p.name) AS employees

// DISTINCT
MATCH (p:Person)-[:WORKS_FOR]->(c:Company)
RETURN count(DISTINCT c) AS numberOfCompanies

Ordering and Limiting

// Order by
MATCH (p:Person)
RETURN p.name, p.age
ORDER BY p.age DESC

// Multiple order fields
MATCH (p:Person)
RETURN p
ORDER BY p.age DESC, p.name ASC

// Limit
MATCH (p:Person)
RETURN p.name
ORDER BY p.age DESC
LIMIT 10

// Skip and limit (pagination)
MATCH (p:Person)
RETURN p.name, p.age
ORDER BY p.age DESC
SKIP 20
LIMIT 10

Indexes and Constraints

// Create index
CREATE INDEX person_email FOR (p:Person) ON (p.email)

// Create composite index
CREATE INDEX person_name_age FOR (p:Person) ON (p.name, p.age)

// Create full-text index
CREATE FULLTEXT INDEX person_search FOR (p:Person) ON EACH [p.name, p.email]

// Use full-text index
CALL db.index.fulltext.queryNodes('person_search', 'alice*')
YIELD node, score
RETURN node.name, score

// Create unique constraint
CREATE CONSTRAINT person_email_unique FOR (p:Person) REQUIRE p.email IS UNIQUE

// Create existence constraint (Enterprise Edition)
CREATE CONSTRAINT person_name_exists FOR (p:Person) REQUIRE p.name IS NOT NULL

// List indexes
SHOW INDEXES

// List constraints
SHOW CONSTRAINTS

// Drop index
DROP INDEX person_email

// Drop constraint
DROP CONSTRAINT person_email_unique

APOC Procedures

APOC (Awesome Procedures On Cypher) provides additional utility functions.

// Date formatting
RETURN apoc.date.format(timestamp(), 'ms', 'yyyy-MM-dd HH:mm:ss') AS formattedDate

// Generate UUID
CREATE (p:Person {id: apoc.create.uuid(), name: 'Alice'})

// JSON operations
MATCH (p:Person {name: 'Alice'})
RETURN apoc.convert.toJson(p) AS personJson

// Load JSON from URL
CALL apoc.load.json('https://api.example.com/data')
YIELD value
RETURN value

// Periodic commit (batch processing)
CALL apoc.periodic.iterate(
"MATCH (p:Person) RETURN p",
"SET p.processed = true",
{batchSize: 1000}
)

// Run Cypher from file
CALL apoc.cypher.runFile('import.cypher')

// Conditional execution
CALL apoc.when(
person.age >= 18,
"SET person:Adult RETURN person",
"SET person:Minor RETURN person",
{person: person}
)

Graph Algorithms

Common graph algorithms for analysis.

// PageRank (requires Graph Data Science library)
CALL gds.pageRank.stream('myGraph')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC

// Shortest path
MATCH (start:Person {name: 'Alice'}), (end:Person {name: 'Bob'})
CALL gds.shortestPath.dijkstra.stream('myGraph', {
sourceNode: start,
targetNode: end
})
YIELD path
RETURN path

// Community detection (Louvain)
CALL gds.louvain.stream('myGraph')
YIELD nodeId, communityId
RETURN gds.util.asNode(nodeId).name AS name, communityId

// Centrality measures
CALL gds.degree.stream('myGraph')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC

Use Cases

Social Network

// Create social network
CREATE (alice:User {name: 'Alice', joined: date('2020-01-01')})
CREATE (bob:User {name: 'Bob', joined: date('2020-02-15')})
CREATE (charlie:User {name: 'Charlie', joined: date('2020-03-20')})
CREATE (alice)-[:FOLLOWS {since: date('2020-02-01')}]->(bob)
CREATE (bob)-[:FOLLOWS {since: date('2020-03-01')}]->(charlie)
CREATE (charlie)-[:FOLLOWS {since: date('2020-04-01')}]->(alice)

// Find mutual follows (friends)
MATCH (u1:User)-[:FOLLOWS]->(u2:User)-[:FOLLOWS]->(u1)
RETURN u1.name, u2.name

// Friend recommendations (friends of friends)
MATCH (user:User {name: 'Alice'})-[:FOLLOWS]->()-[:FOLLOWS]->(recommended:User)
WHERE NOT (user)-[:FOLLOWS]->(recommended) AND user <> recommended
RETURN recommended.name, count(*) AS mutualFriends
ORDER BY mutualFriends DESC

Recommendation Engine

// Product recommendations based on similar users
MATCH (user:User {name: 'Alice'})-[:PURCHASED]->(product:Product)
<-[:PURCHASED]-(other:User)-[:PURCHASED]->(recommendation:Product)
WHERE NOT (user)-[:PURCHASED]->(recommendation)
RETURN recommendation.name, count(*) AS score
ORDER BY score DESC
LIMIT 5

// Collaborative filtering
MATCH (user:User {name: 'Alice'})-[r1:RATED]->(product:Product)
<-[r2:RATED]-(other:User)
WHERE abs(r1.rating - r2.rating) < 2
WITH other, count(*) AS similarity
ORDER BY similarity DESC
LIMIT 10
MATCH (other)-[r:RATED]->(recommendation:Product)
WHERE NOT (user)-[:RATED]->(recommendation) AND r.rating >= 4
RETURN recommendation.name, avg(r.rating) AS avgRating, count(*) AS count
ORDER BY avgRating DESC, count DESC

Knowledge Graph

// Create knowledge graph
CREATE (python:Technology {name: 'Python', type: 'Language'})
CREATE (django:Technology {name: 'Django', type: 'Framework'})
CREATE (web:Domain {name: 'Web Development'})
CREATE (django)-[:BUILT_WITH]->(python)
CREATE (django)-[:USED_FOR]->(web)

// Find all technologies for a domain
MATCH (tech:Technology)-[:USED_FOR]->(domain:Domain {name: 'Web Development'})
RETURN tech.name, tech.type

// Find technology stack (dependencies)
MATCH path = (tech:Technology {name: 'Django'})-[:BUILT_WITH*]->(dependency)
RETURN path

Backup & Restore

Neo4j add-ons use neo4j-admin dump for backups, creating full graph database dumps.

Backup Configuration

  • Tool: neo4j-admin dump
  • Format: .dump
  • Includes: Full graph database dump
  • Storage: AWS S3 (s3://strongly-backups/backups/<addon-id>/)

Manual Backup

  1. Go to add-on details page
  2. Click Backup Now
  3. Monitor progress in job logs
  4. Backup saved as backup-YYYYMMDDHHMMSS.dump

Scheduled Backups

Configure during add-on creation or in settings:

  • Daily backups: Recommended for production
  • Retention: 7-14 days minimum for production
  • Custom cron: For specific schedules

Restore Process

  1. Navigate to Backups tab
  2. Select backup from list
  3. Click Restore
  4. Confirm (add-on will stop temporarily)
  5. Data restored using neo4j-admin load
  6. Add-on automatically restarts
Data Loss

Restoring from backup replaces ALL current data. Create a current backup first if needed.

Performance Optimization

Query Optimization

// Use PROFILE to analyze query execution
PROFILE
MATCH (p:Person)-[:WORKS_FOR]->(c:Company)
WHERE p.age > 25
RETURN p.name, c.name

// Use EXPLAIN to see execution plan
EXPLAIN
MATCH (p:Person)-[:WORKS_FOR]->(c:Company)
WHERE p.age > 25
RETURN p.name, c.name

// Use indexes for better performance
CREATE INDEX person_age FOR (p:Person) ON (p.age)

// Limit early in the query
MATCH (p:Person)
WHERE p.age > 25
WITH p
ORDER BY p.age DESC
LIMIT 100
MATCH (p)-[:WORKS_FOR]->(c:Company)
RETURN p.name, c.name

Data Modeling Best Practices

  1. Model for Queries: Design graph based on how you'll query it
  2. Use Specific Relationship Types: More specific is better than generic
  3. Denormalize When Needed: Duplicate data for query performance
  4. Index Wisely: Index properties used in WHERE clauses
  5. Avoid Super Nodes: Nodes with millions of relationships slow down queries

Monitoring

Monitor your Neo4j add-on through the Strongly platform:

  • CPU Usage: Track CPU utilization
  • Memory Usage: Monitor heap and page cache
  • Disk Space: Watch database size
  • Transaction Count: Active transactions
  • Query Performance: Slow query detection

Database Statistics

// Database info
CALL dbms.queryJmx('org.neo4j:instance=kernel#0,name=Store sizes')
YIELD attributes
RETURN attributes

// Count nodes by label
MATCH (n:Person)
RETURN count(n) AS personCount

// Count relationships by type
MATCH ()-[r:WORKS_FOR]->()
RETURN count(r) AS worksForCount

// Database constraints and indexes
SHOW CONSTRAINTS
SHOW INDEXES

Best Practices

  1. Use Indexes: Index properties used frequently in lookups
  2. Specific Relationship Types: Use descriptive relationship names
  3. Limit Result Sets: Use LIMIT to prevent large result sets
  4. Profile Queries: Use PROFILE/EXPLAIN for optimization
  5. Batch Operations: Use APOC for large data imports
  6. Avoid Cartesian Products: Use proper MATCH patterns
  7. Use Parameters: Parameterize queries for security and performance
  8. Model Carefully: Design schema for your access patterns
  9. Monitor Performance: Track slow queries
  10. Regular Backups: Enable daily backups for production

Troubleshooting

Connection Issues

# Test connection
from neo4j import GraphDatabase

driver = GraphDatabase.driver(uri, auth=(username, password))
try:
driver.verify_connectivity()
print("Connected successfully")
except Exception as e:
print(f"Connection failed: {e}")
finally:
driver.close()

Performance Issues

// Find long-running queries
CALL dbms.listQueries()
YIELD queryId, query, elapsedTimeMillis
WHERE elapsedTimeMillis > 1000
RETURN queryId, query, elapsedTimeMillis
ORDER BY elapsedTimeMillis DESC

// Kill long-running query
CALL dbms.killQuery('query-id')

// Check memory usage
CALL dbms.queryJmx('org.neo4j:instance=kernel#0,name=Memory Pools')
YIELD attributes
RETURN attributes

Support

For issues or questions:

  • Check add-on logs in the Strongly dashboard
  • Review Neo4j official documentation
  • Contact Strongly support through the platform