Skip to content

Neo4j Schema and Constraints

This page documents the Neo4j schema initialization system, including uniqueness constraints, range indexes, and database creation. It covers how the Neo4j repository ensures data integrity and query performance through structured schema definitions.

For details on how these constraints are used during entity, edge, and tag operations, see Neo4j Entity Operations, Neo4j Edge Operations, and Neo4j Tag Management. For broader migration system context, see Neo4j Schema Initialization.

Schema Initialization Overview

The Neo4j schema is initialized through the InitializeSchema function, which creates the database, establishes constraints, and defines indexes. This initialization occurs automatically when a new Neo4j repository is created via assetdb.New.

Initialization Flow

flowchart TD
    Start["InitializeSchema(driver, dbname)"]
    CreateDB["CREATE DATABASE IF NOT EXISTS"]
    StartDB["START DATABASE WAIT 10 SECONDS"]
    CoreConstraints["Apply Core Constraints"]
    CoreIndexes["Apply Core Indexes"]
    ContentConstraints["Apply Content Constraints"]
    Done["Schema Ready"]

    Start --> CreateDB
    CreateDB --> StartDB
    StartDB --> CoreConstraints
    CoreConstraints --> CoreIndexes
    CoreIndexes --> ContentConstraints
    ContentConstraints --> Done

    CoreConstraints --> EntityConstraint["Entity.entity_id UNIQUE"]
    CoreConstraints --> EntityTagConstraint["EntityTag.tag_id UNIQUE"]
    CoreConstraints --> EdgeTagConstraint["EdgeTag.tag_id UNIQUE"]

    CoreIndexes --> EntityEtype["Entity.etype INDEX"]
    CoreIndexes --> EntityUpdated["Entity.updated_at INDEX"]
    CoreIndexes --> TagIndexes["Tag Indexes"]

    ContentConstraints --> AssetConstraints["20+ Asset Type Constraints"]

Core Schema Components

The Neo4j schema defines three primary node types with associated constraints and indexes:

Entity Nodes

Property Constraint Type Purpose
entity_id UNIQUE Ensures each entity has a unique identifier
etype RANGE INDEX Enables efficient queries by asset type
updated_at RANGE INDEX Supports temporal queries and filtering

EntityTag Nodes

Property Constraint Type Purpose
tag_id UNIQUE Ensures each tag has a unique identifier
ttype RANGE INDEX Enables queries by property type
updated_at RANGE INDEX Supports temporal filtering
entity_id RANGE INDEX Optimizes tag lookup by entity

EdgeTag Nodes

Property Constraint Type Purpose
tag_id UNIQUE Ensures each tag has a unique identifier
ttype RANGE INDEX Enables queries by property type
updated_at RANGE INDEX Supports temporal filtering
edge_id RANGE INDEX Optimizes tag lookup by edge

Constraint Implementation Details

The schema uses Cypher's CREATE CONSTRAINT syntax with the IF NOT EXISTS clause to ensure idempotent initialization.

graph TB
    subgraph "Entity Constraint"
        EntityNode["(:Entity)"]
        EntityProp["n.entity_id"]
        EntityConstraint["constraint_entities_entity_id"]

        EntityNode --> EntityProp
        EntityProp --> EntityConstraint
    end

    subgraph "EntityTag Constraint"
        TagNode["(:EntityTag)"]
        TagProp["n.tag_id"]
        TagConstraint["constraint_enttag_tag_id"]

        TagNode --> TagProp
        TagProp --> TagConstraint
    end

    subgraph "EdgeTag Constraint"
        EdgeTagNode["(:EdgeTag)"]
        EdgeTagProp["n.tag_id"]
        EdgeTagConstraint["constraint_edgetag_tag_id"]

        EdgeTagNode --> EdgeTagProp
        EdgeTagProp --> EdgeTagConstraint
    end

executeQuery Helper

The executeQuery function provides a wrapper around Neo4j's ExecuteQuery API, abstracting the context and transaction handling:

sequenceDiagram
    participant Init as InitializeSchema
    participant Exec as executeQuery
    participant Driver as neo4jdb.Driver
    participant DB as Neo4j Database

    Init->>Exec: executeQuery(driver, dbname, query)
    Exec->>Driver: ExecuteQuery(context, driver, query)
    Driver->>DB: Execute Cypher query
    DB-->>Driver: Result or error
    Driver-->>Exec: Result or error
    Exec-->>Init: error or nil

Content-Specific Constraints

Each Open Asset Model asset type has its own constraint ensuring uniqueness on a natural key property. The entitiesContentIndexes function defines constraints for all 21 supported asset types.

Asset Type Constraint Mapping

Asset Type Node Label Unique Property Constraint Name
Account :Account unique_id constraint_account_content_unique_id
AutnumRecord :AutnumRecord handle, number constraint_autnum_content_handle, constraint_autnum_content_number
AutonomousSystem :AutonomousSystem number constraint_autsys_content_number
ContactRecord :ContactRecord discovered_at constraint_contact_record_content_discovered_at
DomainRecord :DomainRecord domain constraint_domainrec_content_domain
File :File url constraint_file_content_url
FQDN :FQDN name constraint_fqdn_content_name
FundsTransfer :FundsTransfer unique_id constraint_ft_content_unique_id
Identifier :Identifier unique_id constraint_identifier_content_unique_id
IPAddress :IPAddress address constraint_ipaddr_content_address
IPNetRecord :IPNetRecord handle constraint_ipnetrec_content_handle
Location :Location address constraint_location_content_name
Netblock :Netblock cidr constraint_netblock_content_cidr
Organization :Organization unique_id constraint_org_content_id
Person :Person unique_id constraint_person_content_id
Phone :Phone e164, raw constraint_phone_content_e164, constraint_phone_content_raw
Product :Product unique_id constraint_product_content_id
ProductRelease :ProductRelease name constraint_productrelease_content_name
Service :Service unique_id constraint_service_content_id
TLSCertificate :TLSCertificate serial_number constraint_tls_content_serial_number
URL :URL url constraint_url_content_url

Additional Content Indexes

Beyond uniqueness constraints, certain asset types have additional range indexes for frequently queried properties:

Asset Type Indexed Property Purpose
IPNetRecord cidr Efficient CIDR range queries
Organization name, legal_name Name-based searches
Person full_name Name-based searches
Product product_name Name-based searches

Schema-to-Code Mapping

graph LR
    subgraph "Neo4j Schema"
        EntityLabel["(:Entity)"]
        AccountLabel["(:Account)"]
        FQDNLabel["(:FQDN)"]
        IPLabel["(:IPAddress)"]
    end

    subgraph "Go Code Structs"
        EntityStruct["types.Entity"]
        AccountStruct["account.Account"]
        FQDNStruct["dns.FQDN"]
        IPStruct["network.IPAddress"]
    end

    subgraph "Conversion Functions"
        NodeToEntity["nodeToEntity()"]
        NodeToAccount["nodeToAccount()"]
        NodeToFQDN["nodeToFQDN()"]
        NodeToIP["nodeToIPAddress()"]
    end

    EntityLabel --> NodeToEntity
    AccountLabel --> NodeToAccount
    FQDNLabel --> NodeToFQDN
    IPLabel --> NodeToIP

    NodeToEntity --> EntityStruct
    NodeToAccount --> AccountStruct
    NodeToFQDN --> FQDNStruct
    NodeToIP --> IPStruct

Constraint Enforcement

Neo4j enforces these constraints at write time, preventing duplicate entities and ensuring data integrity:

sequenceDiagram
    participant Repo as neoRepository
    participant DB as Neo4j Database
    participant Schema as Constraint Engine

    Note over Repo,Schema: Creating Entity with FQDN "example.com"

    Repo->>DB: CREATE (e:Entity {entity_id: uuid})
    DB->>Schema: Check constraint_entities_entity_id
    Schema-->>DB: OK (unique entity_id)

    Repo->>DB: CREATE (f:FQDN {name: "example.com"})
    DB->>Schema: Check constraint_fqdn_content_name

    alt FQDN already exists
        Schema-->>DB: Constraint violation
        DB-->>Repo: Error: "example.com" already exists
    else FQDN is unique
        Schema-->>DB: OK (unique name)
        DB-->>Repo: Node created
    end

Index Types and Performance

Neo4j uses range indexes for all non-constraint indexes, enabling efficient range queries, equality checks, and sorting operations.

Index Usage Patterns

Index Query Pattern Example Cypher
entities_range_index_etype Filter by asset type MATCH (e:Entity) WHERE e.etype = 'FQDN'
entities_range_index_updated_at Temporal queries MATCH (e:Entity) WHERE e.updated_at > datetime('2024-01-01')
enttag_range_index_entity_id Tag lookup MATCH (t:EntityTag) WHERE t.entity_id = $id
org_range_index_name Name searches MATCH (o:Organization) WHERE o.name CONTAINS 'Example'

Database Creation

The initialization process begins by ensuring the database exists and is started:

flowchart LR
    CreateCmd["CREATE DATABASE dbname<br/>IF NOT EXISTS"]
    StartCmd["START DATABASE dbname<br/>WAIT 10 SECONDS"]
    Ready["Database Ready"]

    CreateCmd --> StartCmd
    StartCmd --> Ready

These commands are executed with best-effort error handling (errors are ignored) since the database may already exist or be started.

Error Handling

The schema initialization uses fail-fast error handling for constraints and indexes. If any constraint or index creation fails, the entire initialization process returns an error, preventing partial schema states.

graph TD
    Start["InitializeSchema()"]
    CreateConstraint["Create Constraint"]
    CheckError{"Error?"}
    NextStep["Next Constraint/Index"]
    ReturnError["return err"]
    Complete["return nil"]

    Start --> CreateConstraint
    CreateConstraint --> CheckError
    CheckError -->|Yes| ReturnError
    CheckError -->|No| NextStep
    NextStep --> CreateConstraint
    NextStep --> Complete

Property Extraction from Nodes

The schema's property names directly correspond to the neo4jdb.GetProperty calls in entity extraction functions:

graph TB
    subgraph "FQDN Schema Constraint"
        FQDNNode["(:FQDN)"]
        NameProp["n.name UNIQUE"]
        FQDNNode --> NameProp
    end

    subgraph "nodeToFQDN Function"
        GetProp["neo4jdb.GetProperty[string](node, 'name')"]
        FQDNStruct["&dns.FQDN{Name: name}"]
        GetProp --> FQDNStruct
    end

    NameProp -.->|"Same property name"| GetProp

Multi-Property Constraints

Some asset types require multiple unique constraints to ensure different forms of the same data are treated as distinct:

Phone Number Example

The Phone asset type has two separate uniqueness constraints: - constraint_phone_content_e164 on the standardized E.164 format - constraint_phone_content_raw on the raw input string

This allows the same phone number to exist in multiple formats while preventing exact duplicates.

AutnumRecord Example

The AutnumRecord asset type has two constraints: - constraint_autnum_content_handle on the registry handle - constraint_autnum_content_number on the AS number

This prevents duplicates by both registry identifier and numeric AS value.