Neo4j Repository¶

The Neo4j Repository provides a graph database implementation of the Repository interface, enabling asset storage and retrieval using Neo4j's native graph data model. This implementation leverages Neo4j's Cypher query language and graph traversal capabilities to efficiently manage entities, edges, and their associated tags.

This page covers the Neo4j repository architecture, connection management, data model mapping, and query patterns. For detailed operation-level documentation, see: - Entity operations: Neo4j Entity Operations - Edge operations: Neo4j Edge Operations - Tag operations: Neo4j Tag Management - Schema and constraints: Neo4j Schema and Constraints

For SQL-based implementations, see SQL Repository.

Repository Implementation Structure¶

The neoRepository struct implements the Repository interface using the Neo4j Go Driver. It maintains a driver connection and database name for all operations.

graph TB
    subgraph "Repository Interface Layer"
        RepoInterface["repository.Repository<br/>Interface"]
    end

    subgraph "Neo4j Implementation"
        NeoRepo["neoRepository struct<br/>repository/neo4j/db.go"]
        Driver["db: neo4jdb.DriverWithContext<br/>Neo4j Go Driver v5"]
        DBName["dbname: string<br/>Target database name"]
    end

    subgraph "Core Operations"
        EntityOps["Entity Operations<br/>CreateEntity, FindEntityById<br/>FindEntitiesByContent<br/>FindEntitiesByType, DeleteEntity"]
        EdgeOps["Edge Operations<br/>CreateEdge, FindEdgeById<br/>IncomingEdges, OutgoingEdges<br/>DeleteEdge"]
        TagOps["Tag Operations<br/>CreateEntityTag, GetEntityTags<br/>CreateEdgeTag, GetEdgeTags"]
    end

    subgraph "Extraction Layer"
        ExtractEntity["nodeToEntity()<br/>extract_entity.go"]
        ExtractTags["nodeToEntityTag()<br/>nodeToEdgeTag()<br/>extract_tags.go"]
        ExtractProp["nodeToProperty()<br/>extract_property.go"]
    end

    subgraph "Neo4j Database"
        Nodes["Nodes:<br/>Entity + AssetType labels<br/>EntityTag, EdgeTag"]
        Rels["Relationships:<br/>Typed edges between entities"]
    end

    RepoInterface -.->|"implements"| NeoRepo
    NeoRepo --> Driver
    NeoRepo --> DBName

    NeoRepo --> EntityOps
    NeoRepo --> EdgeOps
    NeoRepo --> TagOps

    EntityOps --> ExtractEntity
    TagOps --> ExtractTags
    ExtractTags --> ExtractProp
    ExtractEntity --> ExtractProp

    EntityOps -.->|"neo4jdb.ExecuteQuery()"| Nodes
    EdgeOps -.->|"neo4jdb.ExecuteQuery()"| Rels
    TagOps -.->|"neo4jdb.ExecuteQuery()"| Nodes

Connection and Initialization¶

The New() function creates a neoRepository instance by parsing the DSN, establishing a driver connection, and verifying connectivity.

DSN Format¶

The DSN format is: neo4j://[username:password@]host[:port]/database

Connection Configuration¶

Parameter	Value	Description
`MaxConnectionPoolSize`	20	Maximum concurrent connections
`MaxConnectionLifetime`	1 hour	Connection lifetime before refresh
`ConnectionLivenessCheckTimeout`	10 minutes	Timeout for liveness checks

sequenceDiagram
    participant Client
    participant New["New() function<br/>db.go:27-59"]
    participant URLParse["url.Parse()<br/>DSN parsing"]
    participant Driver["neo4jdb.NewDriverWithContext()<br/>Driver creation"]
    participant Verify["driver.VerifyConnectivity()<br/>Connection test"]
    participant Repo["neoRepository"]

    Client->>New: New(dbtype, dsn)
    New->>URLParse: Parse DSN
    URLParse-->>New: URL components
    New->>New: Extract auth credentials
    New->>New: Extract database name from path
    New->>Driver: Create driver with config
    Driver-->>New: neo4jdb.DriverWithContext
    New->>Verify: Test connectivity (5s timeout)
    alt Connection successful
        Verify-->>New: nil error
        New->>Repo: Create neoRepository
        Repo-->>Client: Return repository
    else Connection failed
        Verify-->>New: error
        New-->>Client: Return error
    end

Entity-to-Node Mapping¶

Entities are stored as Neo4j nodes with dual labels: a base Entity label and an asset-specific type label (e.g., FQDN, IPAddress, Organization). All asset properties are flattened into node properties.

Node Structure¶

Each entity node contains: - entity_id: Unique identifier (constrained) - etype: Asset type string - created_at: Creation timestamp (LocalDateTime) - updated_at: Last seen timestamp (LocalDateTime) - Asset-specific properties: Flattened from OAM asset structure

graph LR
    subgraph "types.Entity Struct"
        Entity["Entity<br/>ID: string<br/>CreatedAt: time.Time<br/>LastSeen: time.Time<br/>Asset: oam.Asset"]
        Asset["oam.Asset<br/>e.g., dns.FQDN,<br/>network.IPAddress,<br/>org.Organization"]
    end

    subgraph "Neo4j Node"
        Node["Node<br/>Labels: Entity + AssetType<br/>Properties: Flattened"]
        Props["entity_id: string<br/>etype: string<br/>created_at: LocalDateTime<br/>updated_at: LocalDateTime<br/>+ asset-specific fields"]
    end

    subgraph "Conversion Functions"
        ToProps["entityPropsMap()<br/>Flatten entity to map"]
        ToEntity["nodeToEntity()<br/>extract_entity.go:29-111"]
        TypeSwitch["Asset type switch<br/>nodeToFQDN()<br/>nodeToIPAddress()<br/>nodeToOrganization()<br/>etc."]
    end

    Entity --> Asset
    Entity -->|"Create"| ToProps
    ToProps --> Node
    Node -->|"Read"| ToEntity
    ToEntity --> TypeSwitch
    TypeSwitch --> Entity
    Node --> Props

Supported Asset Types¶

The nodeToEntity() function supports conversion for all OAM asset types:

Asset Type	Converter Function	Unique Key Property
`Account`	`nodeToAccount()`	`unique_id`
`AutnumRecord`	`nodeToAutnumRecord()`	`handle`, `number`
`AutonomousSystem`	`nodeToAutonomousSystem()`	`number`
`ContactRecord`	`nodeToContactRecord()`	`discovered_at`
`DomainRecord`	`nodeToDomainRecord()`	`domain`
`File`	`nodeToFile()`	`url`
`FQDN`	`nodeToFQDN()`	`name`
`FundsTransfer`	`nodeToFundsTransfer()`	`unique_id`
`Identifier`	`nodeToIdentifier()`	`unique_id`
`IPAddress`	`nodeToIPAddress()`	`address`
`IPNetRecord`	`nodeToIPNetRecord()`	`handle`
`Location`	`nodeToLocation()`	`address`
`Netblock`	`nodeToNetblock()`	`cidr`
`Organization`	`nodeToOrganization()`	`unique_id`
`Person`	`nodeToPerson()`	`unique_id`
`Phone`	`nodeToPhone()`	`e164`, `raw`
`Product`	`nodeToProduct()`	`unique_id`
`ProductRelease`	`nodeToProductRelease()`	`name`
`Service`	`nodeToService()`	`unique_id`
`TLSCertificate`	`nodeToTLSCertificate()`	`serial_number`
`URL`	`nodeToURL()`	`url`

Cypher Query Patterns¶

The Neo4j repository uses standardized Cypher query patterns for consistency and performance.

Entity Creation Pattern¶

CREATE (a:Entity:{AssetType} $props) RETURN a

Example for FQDN:

CREATE (a:Entity:FQDN {
  entity_id: "uuid-here",
  etype: "FQDN",
  created_at: localdatetime(),
  updated_at: localdatetime(),
  name: "example.com"
}) RETURN a

Entity Retrieval Patterns¶

Operation	Cypher Pattern
By ID	`MATCH (a:Entity {entity_id: $eid}) RETURN a`
By Content	`MATCH (a:{AssetType} {key: value}) RETURN a`
By Type	`MATCH (a:{AssetType}) RETURN a`
By Type + Since	`MATCH (a:{AssetType}) WHERE a.updated_at >= localDateTime($since) RETURN a`

Entity Update Pattern¶

MATCH (a:Entity {entity_id: $eid})
SET a = $props
RETURN a

Entity Deletion Pattern¶

MATCH (n:Entity {entity_id: $eid})
DETACH DELETE n

The DETACH DELETE ensures all relationships are removed before node deletion.

Duplicate Prevention Strategy¶

The repository prevents duplicate entities by checking for existing entities before creation. When a duplicate is detected, the existing entity's updated_at timestamp is refreshed rather than creating a new node.

sequenceDiagram
    participant Client
    participant CreateEntity["CreateEntity()<br/>entity.go:23-124"]
    participant FindContent["FindEntitiesByContent()"]
    participant UpdateNode["MATCH + SET query"]
    participant CreateNode["CREATE query"]
    participant ExtractNode["nodeToEntity()"]

    Client->>CreateEntity: CreateEntity(input)
    CreateEntity->>FindContent: Check for existing entity
    alt Entity exists
        FindContent-->>CreateEntity: Existing entity found
        CreateEntity->>CreateEntity: Validate asset type match
        CreateEntity->>UpdateNode: Update existing node timestamp
        UpdateNode-->>CreateEntity: Updated node
        CreateEntity->>ExtractNode: Convert to Entity
        ExtractNode-->>Client: Return existing entity
    else Entity not found
        FindContent-->>CreateEntity: No entity found
        CreateEntity->>CreateEntity: Generate UUID if needed
        CreateEntity->>CreateEntity: Set timestamps
        CreateEntity->>CreateNode: CREATE new node
        CreateNode-->>CreateEntity: New node
        CreateEntity->>ExtractNode: Convert to Entity
        ExtractNode-->>Client: Return new entity
    end

Time Handling¶

Neo4j uses LocalDateTime for timestamp storage, which doesn't include timezone information. The repository converts between Go's time.Time and Neo4j's LocalDateTime format.

Conversion Functions¶

Go Time to Neo4j Format:

func timeToNeo4jTime(t time.Time) string {
    return t.Format("2006-01-02T15:04:05")
}

Neo4j Format to Go Time:

func neo4jTimeToTime(t neo4jdb.LocalDateTime) time.Time {
    return time.Date(
        t.Year(), time.Month(t.Month()), t.Day(),
        t.Hour(), t.Minute(), t.Second(),
        t.Nanosecond(), time.UTC,
    )
}

All timestamps are stored and compared in UTC.

Property Extraction Pipeline¶

The property extraction pipeline converts Neo4j node properties back into typed Go structures. This is a multi-stage process that handles different property types.

graph TD
    subgraph "Entry Point"
        NodeToEntity["nodeToEntity()<br/>extract_entity.go:29"]
    end

    subgraph "Asset Type Dispatch"
        TypeSwitch["switch on oam.AssetType<br/>Cases for 21 asset types"]
    end

    subgraph "Asset Converters"
        FQDN["nodeToFQDN()<br/>line 337"]
        IPAddr["nodeToIPAddress()<br/>line 446"]
        Org["nodeToOrganization()<br/>line 670"]
        Cert["nodeToTLSCertificate()<br/>line 901"]
        Other["... 17 more converters"]
    end

    subgraph "Property Extraction"
        GetProp["neo4jdb.GetProperty[T]()<br/>Type-safe extraction"]
        ParseIP["netip.ParseAddr()<br/>netip.ParsePrefix()"]
        TypeConv["Type conversions<br/>int64 -> int<br/>[]interface{} -> []string"]
    end

    subgraph "Result"
        EntityStruct["types.Entity<br/>with typed Asset"]
    end

    NodeToEntity --> TypeSwitch
    TypeSwitch --> FQDN
    TypeSwitch --> IPAddr
    TypeSwitch --> Org
    TypeSwitch --> Cert
    TypeSwitch --> Other

    FQDN --> GetProp
    IPAddr --> GetProp
    Org --> GetProp
    Cert --> GetProp

    IPAddr --> ParseIP

    GetProp --> TypeConv
    TypeConv --> EntityStruct

Error Handling¶

Each property extraction uses neo4jdb.GetProperty[T]() which returns an error if: - The property doesn't exist on the node - The property type doesn't match the expected type T

All errors are propagated up the call stack, ensuring that malformed data is never silently ignored.

Context and Timeout Management¶

All Neo4j operations use context-based timeouts for reliability. The standard timeout is 30 seconds for query execution.

ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()

result, err := neo4jdb.ExecuteQuery(ctx, neo.db, query, params,
    neo4jdb.EagerResultTransformer,
    neo4jdb.ExecuteQueryWithDatabase(neo.dbname))

This pattern ensures: - Operations don't hang indefinitely - Resources are properly released via defer cancel() - Database-specific queries are routed correctly via ExecuteQueryWithDatabase()

Database-Specific Query Execution¶

All queries specify the target database using neo4jdb.ExecuteQueryWithDatabase(neo.dbname). This allows multi-database Neo4j deployments where different databases serve different purposes.

The database name is extracted from the DSN path during initialization:

dbname := strings.TrimPrefix(u.Path, "/")

For example, DSN neo4j://localhost:7687/assetdb results in dbname = "assetdb".

Summary¶

The Neo4j Repository implementation provides:

Feature	Implementation
Connection Management	Neo4j Go Driver v5 with connection pooling
Data Model	Dual-labeled nodes (Entity + AssetType) with flattened properties
Query Language	Cypher with parameterized queries
Duplicate Prevention	Content-based lookup before creation
Type Safety	Type-safe property extraction with error handling
Timeout Management	30-second context-based timeouts
Asset Type Support	All 21 OAM asset types
Schema Management	Constraints and indexes for performance (see Neo4j Schema and Constraints)

The implementation leverages Neo4j's native graph capabilities while maintaining compatibility with the unified Repository interface, enabling seamless database backend switching.

Neo4j Entity Operations¶

This document describes entity management operations in the Neo4j graph database repository implementation. It covers creating, querying, updating, and deleting entity nodes, as well as the conversion between Neo4j nodes and the application's types.Entity structures.

For information about edge (relationship) operations, see Neo4j Edge Operations. For tag management, see Neo4j Tag Management. For schema constraints and indexes, see Neo4j Schema and Constraints.

Overview of Entity Operations¶

The neoRepository provides five core entity operations that interact with Neo4j nodes labeled as :Entity:

Operation	Method	Description
Create	`CreateEntity(*types.Entity)`	Creates a new entity node or updates existing if duplicate detected
Create Asset	`CreateAsset(oam.Asset)`	Convenience wrapper for creating entity from asset
Find by ID	`FindEntityById(string)`	Retrieves entity by unique entity_id
Find by Content	`FindEntitiesByContent(oam.Asset, time.Time)`	Searches for entities matching asset content
Find by Type	`FindEntitiesByType(oam.AssetType, time.Time)`	Retrieves all entities of a specific asset type
Delete	`DeleteEntity(string)`	Removes entity node and its relationships

Entity Creation Flow¶

The CreateEntity method implements a duplicate-prevention mechanism by first checking if an entity with matching content already exists.

CreateEntity Process Diagram¶

flowchart TD
    Start["CreateEntity(input *types.Entity)"] --> ValidateInput["Validate input != nil"]
    ValidateInput --> CheckDuplicate["FindEntitiesByContent(input.Asset, time.Time{})"]

    CheckDuplicate -->|"Found existing"| UpdateExisting["Update existing entity"]
    CheckDuplicate -->|"Not found"| CreateNew["Create new entity"]

    UpdateExisting --> BuildUpdateQuery["queryNodeByAssetKey('a', asset)<br/>Build MATCH query"]
    BuildUpdateQuery --> SetLastSeen["Set e.LastSeen = time.Now()"]
    SetLastSeen --> EntityPropsMap["entityPropsMap(e)"]
    EntityPropsMap --> ExecuteUpdate["ExecuteQuery:<br/>MATCH + SET a = $props"]
    ExecuteUpdate --> ExtractNode1["Extract neo4jdb.Node from result"]
    ExtractNode1 --> NodeToEntity1["nodeToEntity(node)"]

    CreateNew --> AssignID["Generate unique ID if empty:<br/>uniqueEntityID()"]
    AssignID --> SetTimestamps["Set CreatedAt and LastSeen"]
    SetTimestamps --> EntityPropsMap2["entityPropsMap(input)"]
    EntityPropsMap2 --> ExecuteCreate["ExecuteQuery:<br/>CREATE (a:Entity:AssetType $props)"]
    ExecuteCreate --> ExtractNode2["Extract neo4jdb.Node from result"]
    ExtractNode2 --> NodeToEntity2["nodeToEntity(node)"]

    NodeToEntity1 --> Return["Return *types.Entity"]
    NodeToEntity2 --> Return

Key Implementation Details¶

The CreateEntity method performs the following steps:

Duplicate Detection : Calls FindEntitiesByContent to check if an entity with the same asset content exists
Update Path : If duplicate found:
Validates asset types match
Builds Cypher MATCH query using queryNodeByAssetKey
Updates LastSeen timestamp
Executes SET query to update node properties
Create Path : If new entity:
Generates unique ID using uniqueEntityID()
Sets CreatedAt and LastSeen timestamps
Executes CREATE query with entity type label

The method uses neo4jdb.ExecuteQuery with a 30-second timeout context .

Query Operations¶

FindEntityById¶

Retrieves a single entity by its unique entity_id property:

MATCH (a:Entity {entity_id: $eid}) RETURN a

Implementation:

FindEntitiesByContent¶

Searches for entities matching specific asset content. Constructs a MATCH query using queryNodeByAssetKey to match on asset-specific properties:

MATCH (a:AssetType {key_property: value}) WHERE a.updated_at >= localDateTime('...') RETURN a

The since parameter filters entities by their updated_at timestamp. If since.IsZero(), no time filter is applied .

Implementation:

FindEntitiesByType¶

Retrieves all entities of a specific asset type using the asset type as a node label:

MATCH (a:FQDN) WHERE a.updated_at >= localDateTime('...') RETURN a

Returns a slice of []*types.Entity by iterating through all matching records .

Implementation:

DeleteEntity¶

Removes an entity and all its relationships using DETACH DELETE:

MATCH (n:Entity {entity_id: $eid}) DETACH DELETE n

The DETACH clause ensures all incoming and outgoing relationships are deleted first .

Implementation:

Node to Entity Conversion¶

Conversion Architecture¶

flowchart LR
    Node["neo4jdb.Node<br/>(from query result)"] --> NodeToEntity["nodeToEntity(node)"]

    NodeToEntity --> ExtractCommon["Extract common properties:<br/>- entity_id<br/>- created_at<br/>- updated_at<br/>- etype"]

    ExtractCommon --> TypeSwitch["Switch on oam.AssetType"]

    TypeSwitch --> Account["nodeToAccount()"]
    TypeSwitch --> FQDN["nodeToFQDN()"]
    TypeSwitch --> IPAddress["nodeToIPAddress()"]
    TypeSwitch --> Organization["nodeToOrganization()"]
    TypeSwitch --> OtherTypes["... (20+ other types)"]

    Account --> BuildEntity["Construct types.Entity:<br/>- ID<br/>- CreatedAt<br/>- LastSeen<br/>- Asset"]
    FQDN --> BuildEntity
    IPAddress --> BuildEntity
    Organization --> BuildEntity
    OtherTypes --> BuildEntity

    BuildEntity --> Return["Return *types.Entity"]

nodeToEntity Dispatcher¶

The nodeToEntity function extracts common entity properties and dispatches to type-specific converters:

Extract Common Properties:
entity_id (string)
created_at (neo4jdb.LocalDateTime)
updated_at (neo4jdb.LocalDateTime)
etype (asset type string)
Dispatch by Asset Type: Uses a switch statement on oam.AssetType to call the appropriate converter function
Construct Entity: Returns types.Entity with populated fields

Supported Asset Types¶

The following table lists all supported asset types and their conversion functions:

Asset Type	Converter Function	Key Properties	Lines
`Account`	`nodeToAccount`	unique_id, account_type, username, account_number, balance, active	113-152
`AutnumRecord`	`nodeToAutnumRecord`	raw, number, handle, name, whois_server, created_date, updated_date, status	154-211
`AutonomousSystem`	`nodeToAutonomousSystem`	number	213-221
`ContactRecord`	`nodeToContactRecord`	discovered_at	223-230
`DomainRecord`	`nodeToDomainRecord`	raw, id, domain, punycode, name, extension, whois_server, dates, status, dnssec	232-312
`File`	`nodeToFile`	url, name, type	314-335
`FQDN`	`nodeToFQDN`	name	337-344
`FundsTransfer`	`nodeToFundsTransfer`	unique_id, amount, reference_number, currency, transfer_method, exchange_date, exchange_rate	346-391
`Identifier`	`nodeToIdentifier`	unique_id, entity_id, id_type, category, dates, status	393-444
`IPAddress`	`nodeToIPAddress`	address (parsed as netip.Addr), type	446-466
`IPNetRecord`	`nodeToIPNetRecord`	raw, cidr (netip.Prefix), handle, start/end addresses, type, name, method, country, parent_handle, whois_server, dates, status	468-575
`Location`	`nodeToLocation`	address, building, building_number, street_name, unit, po_box, city, locality, province, country, postal_code	577-646
`Netblock`	`nodeToNetblock`	cidr (netip.Prefix), type	648-668
`Organization`	`nodeToOrganization`	unique_id, name, legal_name, founding_date, industry, active, non_profit, num_of_employees	670-722
`Person`	`nodeToPerson`	unique_id, full_name, first_name, middle_name, family_name, birth_date, gender	724-769
`Phone`	`nodeToPhone`	type, raw, e164, country_abbrev, country_code, ext	771-811
`Product`	`nodeToProduct`	unique_id, product_name, product_type, category, description, country_of_origin	813-852
`ProductRelease`	`nodeToProductRelease`	name, release_date	854-869
`Service`	`nodeToService`	unique_id, service_type, output, output_length	871-899
`TLSCertificate`	`nodeToTLSCertificate`	version, serial_number, subject/issuer common names, not_before/after, key_usage, ext_key_usage, signature/public key algorithms, is_ca, crl_distribution_points, subject/authority key IDs	901-1003
`URL`	`nodeToURL`	url, scheme, username, password, host, port, path, options, fragment	1005-1063

Property Extraction Patterns¶

String Properties¶

Most string properties use neo4jdb.GetProperty[string]:

name, err := neo4jdb.GetProperty[string](node, "name")
if err != nil {
    return nil, err
}

Example:

Numeric Properties¶

Integer properties are extracted as int64 and converted to int:

num, err := neo4jdb.GetProperty[int64](node, "number")
if err != nil {
    return nil, err
}
number := int(num)

Example:

Network Address Properties¶

IP addresses and CIDR blocks use netip package types:

ip, err := neo4jdb.GetProperty[string](node, "address")
if err != nil {
    return nil, err
}
addr, err := netip.ParseAddr(ip)
if err != nil {
    return nil, err
}

Example:

Array Properties¶

String arrays are extracted as []interface{} and converted:

list, err := neo4jdb.GetProperty[[]interface{}](node, "status")
if err != nil {
    return nil, err
}
var status []string
for _, s := range list {
    status = append(status, s.(string))
}

Example:

Duplicate Prevention Mechanism¶

Duplicate Detection Flow¶

sequenceDiagram
    participant Client
    participant CreateEntity
    participant FindEntitiesByContent
    participant Neo4j

    Client->>CreateEntity: CreateEntity(entity)
    CreateEntity->>FindEntitiesByContent: Check for duplicate
    FindEntitiesByContent->>Neo4j: MATCH by content properties

    alt Entity exists
        Neo4j-->>FindEntitiesByContent: Return existing entity
        FindEntitiesByContent-->>CreateEntity: []*types.Entity (len > 0)
        CreateEntity->>CreateEntity: Validate asset type matches
        CreateEntity->>CreateEntity: Update LastSeen timestamp
        CreateEntity->>Neo4j: MATCH + SET update query
        Neo4j-->>CreateEntity: Updated node
        CreateEntity->>CreateEntity: nodeToEntity conversion
        CreateEntity-->>Client: Return updated entity
    else Entity does not exist
        Neo4j-->>FindEntitiesByContent: No results
        FindEntitiesByContent-->>CreateEntity: Error
        CreateEntity->>CreateEntity: Generate unique ID
        CreateEntity->>CreateEntity: Set CreatedAt, LastSeen
        CreateEntity->>Neo4j: CREATE new entity node
        Neo4j-->>CreateEntity: New node
        CreateEntity->>CreateEntity: nodeToEntity conversion
        CreateEntity-->>Client: Return new entity
    end

Uniqueness Guarantees¶

The system prevents duplicates through two mechanisms:

Application-Level Deduplication: The CreateEntity method checks for existing entities before creating new ones
Database Constraints: Neo4j unique constraints on content properties ensure database-level uniqueness. See schema initialization in which creates constraints like:
FQDN.name must be unique
IPAddress.address must be unique
Organization.unique_id must be unique

Unique ID Generation¶

The uniqueEntityID function generates universally unique identifiers for entities:

func (neo *neoRepository) uniqueEntityID() string {
    for {
        id := uuid.New().String()
        if _, err := neo.FindEntityById(id); err != nil {
            return id
        }
    }
}

This implementation: 1. Generates a UUID v4 using google/uuid package 2. Verifies uniqueness by attempting to find an entity with that ID 3. Loops until a unique ID is found (collision is extremely unlikely with UUID v4)

Test Coverage¶

The entity operations are validated through integration tests in . Key test scenarios include:

Test	Coverage	Lines
`TestCreateEntity`	Duplicate prevention, timestamp handling	23-67
`TestFindEntityById`	ID-based retrieval	69-88
`TestFindEntitiesByContent`	Content-based search, time filtering	90-114
`TestFindEntitiesByType`	Type-based queries with time filters	116-148
`TestDeleteEntity`	Entity deletion	150-163

Tests verify: - Duplicate entities are not created - LastSeen timestamps are updated on duplicates - CreatedAt remains unchanged for duplicates - Content-based queries respect since parameter - Type-based queries return multiple entities

Neo4j Edge Operations¶

This document describes the edge (relationship) operations in the Neo4j repository implementation. Edges represent directed relationships between entities in the graph database, enabling traversal and querying of asset connections. For entity-level operations, see Neo4j Entity Operations. For tag management on edges, see Neo4j Tag Management.

Overview¶

The Neo4j repository stores edges as native graph relationships in Neo4j, using relationship types derived from the Open Asset Model. Each edge connects two entity nodes and includes temporal metadata (creation time, last seen) along with relationship-specific properties.

Key Features: - Relationship validation against OAM taxonomy - Duplicate edge detection and timestamp updates - Bidirectional edge traversal (incoming/outgoing) - Temporal filtering with since parameter - Label-based filtering for relationship types - Native Cypher query execution

Edge Data Model¶

Edges in Neo4j are represented as relationships between entity nodes. The relationship type corresponds to the OAM relation label (e.g., DNS_RECORD, NODE, SIMPLE_RELATION).

graph LR
    subgraph "Graph Structure"
        FromEntity["(from:Entity)<br/>entity_id: UUID"]
        ToEntity["(to:Entity)<br/>entity_id: UUID"]
        Relationship["[r:RELATIONSHIP_TYPE]<br/>edge_id: elementId<br/>created_at: LocalDateTime<br/>updated_at: LocalDateTime<br/>+ relation properties"]
    end

    FromEntity -->|"relationship type"| Relationship
    Relationship --> ToEntity

    subgraph "types.Edge Representation"
        EdgeStruct["types.Edge<br/>ID: string<br/>CreatedAt: time.Time<br/>LastSeen: time.Time<br/>Relation: oam.Relation<br/>FromEntity: *types.Entity<br/>ToEntity: *types.Entity"]
    end

    Relationship -.->|"converted to"| EdgeStruct

Edge Creation¶

Validation and Creation Flow¶

The CreateEdge method performs comprehensive validation before creating relationships in the graph database.

flowchart TD
    Start["CreateEdge(edge *types.Edge)"]

    ValidateInput["Validate Input:<br/>- edge != nil<br/>- Relation != nil<br/>- FromEntity/ToEntity exist<br/>- Assets != nil"]

    ValidateOAM["Validate Against OAM:<br/>oam.ValidRelationship()<br/>Check: FromAssetType -[Label]-> ToAssetType"]

    SetTimestamp["Set LastSeen:<br/>if edge.LastSeen.IsZero()<br/>then time.Now()"]

    CheckDup{"isDuplicateEdge()?<br/>Check existing relationships"}

    UpdateExisting["Update existing edge:<br/>edgeSeen(edge, updated)"]

    ReturnExisting["Return existing edge"]

    SetCreated["Set CreatedAt:<br/>if edge.CreatedAt.IsZero()<br/>then time.Now()"]

    BuildProps["Build properties map:<br/>edgePropsMap(edge)"]

    ExecuteCypher["Execute Cypher:<br/>MATCH (from:Entity {entity_id})<br/>MATCH (to:Entity {entity_id})<br/>CREATE (from)-[r:TYPE props]->(to)"]

    ConvertResult["Convert relationship to Edge:<br/>relationshipToEdge(rel)"]

    ReturnNew["Return new edge"]

    Error["Return error"]

    Start --> ValidateInput
    ValidateInput -->|"Valid"| ValidateOAM
    ValidateInput -->|"Invalid"| Error
    ValidateOAM -->|"Valid"| SetTimestamp
    ValidateOAM -->|"Invalid"| Error
    SetTimestamp --> CheckDup
    CheckDup -->|"Duplicate"| UpdateExisting
    UpdateExisting --> ReturnExisting
    CheckDup -->|"New"| SetCreated
    SetCreated --> BuildProps
    BuildProps --> ExecuteCypher
    ExecuteCypher -->|"Success"| ConvertResult
    ExecuteCypher -->|"Error"| Error
    ConvertResult --> ReturnNew

Implementation Details¶

Input Validation¶

The method first validates all required fields:

Validation checks (edge.go:24-27):
- edge != nil
- edge.Relation != nil
- edge.FromEntity != nil && edge.FromEntity.Asset != nil
- edge.ToEntity != nil && edge.ToEntity.Asset != nil

OAM Taxonomy Validation¶

The relationship must be valid according to the Open Asset Model taxonomy:

oam.ValidRelationship(
    edge.FromEntity.Asset.AssetType(),
    edge.Relation.Label(),
    edge.Relation.RelationType(),
    edge.ToEntity.Asset.AssetType()
)

Returns error if invalid: "{FromType} -{Label}-> {ToType} is not valid in the taxonomy"

Cypher Query Construction¶

The relationship is created using a three-part Cypher query:

MATCH (from:Entity {entity_id: '{fromID}'})
MATCH (to:Entity {entity_id: '{toID}'})
CREATE (from)-[r:RELATIONSHIP_TYPE $props]->(to)
RETURN r

The relationship type is uppercased from the relation label (e.g., dns_record → DNS_RECORD).

Duplicate Edge Handling¶

Duplicate Detection Logic¶

The isDuplicateEdge method prevents duplicate relationships between the same entities with identical relation content.

flowchart TD
    Start["isDuplicateEdge(edge, updated)"]

    GetOutgoing["Get outgoing edges:<br/>OutgoingEdges(edge.FromEntity)"]

    IterateEdges["Iterate through outgoing edges"]

    CheckMatch{"Match found?<br/>- ToEntity.ID match<br/>- DeepEqual(Relations)"}

    UpdateTimestamp["Update timestamp:<br/>edgeSeen(out, updated)"]

    FetchEdge["Fetch updated edge:<br/>FindEdgeById(out.ID)"]

    ReturnTrue["Return (edge, true)"]

    ReturnFalse["Return (nil, false)"]

    Start --> GetOutgoing
    GetOutgoing --> IterateEdges
    IterateEdges --> CheckMatch
    CheckMatch -->|"Match"| UpdateTimestamp
    UpdateTimestamp --> FetchEdge
    FetchEdge --> ReturnTrue
    CheckMatch -->|"No match"| IterateEdges
    IterateEdges -->|"No more edges"| ReturnFalse

Duplicate Criteria: 1. Same ToEntity.ID 2. reflect.DeepEqual(edge.Relation, out.Relation) returns true

Timestamp Update¶

When a duplicate is detected, the edgeSeen method updates the updated_at timestamp:

Cypher query (edge.go:117):
MATCH ()-[r]->() 
WHERE elementId(r) = $eid 
SET r.updated_at = localDateTime('{timestamp}')

Edge Retrieval¶

Find Edge By ID¶

The FindEdgeById method retrieves a specific edge using its Neo4j element ID:

MATCH (from:Entity)-[r]->(to:Entity) 
WHERE elementId(r) = $eid 
RETURN r, from.entity_id AS fid, to.entity_id AS tid

Returns: - types.Edge with populated FromEntity.ID and ToEntity.ID - Error if edge not found

Edge Traversal¶

Incoming Edges¶

The IncomingEdges method finds all edges pointing to a specific entity:

Method Signature:

IncomingEdges(entity *types.Entity, since time.Time, labels ...string) ([]*types.Edge, error)

Cypher Queries:

Condition	Query Pattern
No temporal filter	`MATCH (:Entity {entity_id: $eid})<-[r]-(from:Entity) RETURN r, from.entity_id AS fid`
With `since` filter	`MATCH (:Entity {entity_id: $eid})<-[r]-(from:Entity) WHERE r.updated_at >= localDateTime('{since}') RETURN r, from.entity_id AS fid`

Label Filtering:

After query execution, results are filtered by relationship type if labels are specified:

Post-processing (edge.go:214-227):
- If labels provided, check each relationship
- Compare r.Type against provided labels (case-insensitive)
- Only include matching relationships

Outgoing Edges¶

The OutgoingEdges method finds all edges originating from a specific entity:

Method Signature:

OutgoingEdges(entity *types.Entity, since time.Time, labels ...string) ([]*types.Edge, error)

Cypher Queries:

Condition	Query Pattern
No temporal filter	`MATCH (:Entity {entity_id: $eid})-[r]->(to:Entity) RETURN r, to.entity_id AS tid`
With `since` filter	`MATCH (:Entity {entity_id: $eid})-[r]->(to:Entity) WHERE r.updated_at >= localDateTime('{since}') RETURN r, to.entity_id AS tid`

Label Filtering:

Identical to incoming edges, with post-processing label filtering.

Traversal Patterns¶

graph TD
    subgraph "Incoming Edge Traversal"
        ExtFrom1["(from1:Entity)"]
        ExtFrom2["(from2:Entity)"]
        ExtFrom3["(from3:Entity)"]
        Target["(target:Entity)<br/>{entity_id: 'xyz'}"]

        ExtFrom1 -->|"[r1:TYPE1]"| Target
        ExtFrom2 -->|"[r2:TYPE2]"| Target
        ExtFrom3 -->|"[r3:TYPE1]"| Target
    end

    subgraph "Outgoing Edge Traversal"
        Source["(source:Entity)<br/>{entity_id: 'abc'}"]
        ExtTo1["(to1:Entity)"]
        ExtTo2["(to2:Entity)"]
        ExtTo3["(to3:Entity)"]

        Source -->|"[r4:TYPE1]"| ExtTo1
        Source -->|"[r5:TYPE2]"| ExtTo2
        Source -->|"[r6:TYPE1]"| ExtTo3
    end

    subgraph "Filter Options"
        SinceFilter["Temporal Filter:<br/>r.updated_at >= since"]
        LabelFilter["Label Filter:<br/>r.Type in labels"]
    end

Edge Deletion¶

The DeleteEdge method removes a relationship from the graph:

Cypher Query:

MATCH ()-[r]->() 
WHERE elementId(r) = $eid 
DELETE r

Note: This only deletes the relationship; entity nodes remain intact.

Data Conversion¶

Relationship to Edge Conversion¶

The Neo4j driver returns relationships as neo4jdb.Relationship objects, which must be converted to types.Edge:

flowchart LR
    NeoRel["neo4jdb.Relationship<br/>- ElementId<br/>- Type<br/>- Properties<br/>- StartNodeElementId<br/>- EndNodeElementId"]

    Extract["Extract Properties:<br/>- edge_id (elementId)<br/>- created_at<br/>- updated_at<br/>- relation properties"]

    ConvertRel["Convert Relation:<br/>relationshipPropsToRelation()"]

    BuildEdge["types.Edge<br/>- ID: string<br/>- CreatedAt: time.Time<br/>- LastSeen: time.Time<br/>- Relation: oam.Relation"]

    NeoRel --> Extract
    Extract --> ConvertRel
    ConvertRel --> BuildEdge

Key Functions:

Function	Purpose	Location
`relationshipToEdge`	Converts Neo4j relationship to `types.Edge`
`relationshipPropsToRelation`	Extracts OAM relation from relationship properties
`edgePropsMap`	Creates property map for relationship creation

Testing¶

The Neo4j edge operations include comprehensive integration tests:

Test	Description	File Reference
`TestCreateEdge`	Tests edge creation, validation, and duplicate handling
`TestFindEdgeById`	Tests edge retrieval by ID
`TestIncomingEdges`	Tests incoming edge traversal and filtering
`TestOutgoingEdges`	Tests outgoing edge traversal and filtering
`TestDeleteEdge`	Tests edge deletion

Test Scenarios: - Invalid label validation - Duplicate edge detection with timestamp updates - Temporal filtering with since parameter - Label filtering with multiple relationship types - Edge deletion and verification

Error Handling¶

The edge operations return errors in the following scenarios:

Error Condition	Error Message	Method
Null inputs	"failed input validation checks"	`CreateEdge`
Invalid OAM relationship	"{FromType} -{Label}-> {ToType} is not valid in the taxonomy"	`CreateEdge`
No records returned	"no records returned from the query"	`CreateEdge`
Nil relationship	"the record value for the relationship is nil"	`CreateEdge`, `FindEdgeById`
Edge not found	"no edge was found"	`FindEdgeById`
Zero edges found	"zero edges found"	`IncomingEdges`, `OutgoingEdges`

Performance Considerations¶

Indexes¶

The Neo4j schema includes indexes on relationship properties to optimize edge queries:

Edge-related indexes (schema.go):
- CREATE INDEX edgetag_range_index_edge_id 
  FOR (n:EdgeTag) ON (n.edge_id)

Note: Relationships do not support unique constraints or indexes on their properties in Neo4j. Performance is optimized through: 1. Entity node indexes on entity_id 2. Efficient Cypher query patterns 3. Label filtering in application layer when needed

Query Optimization¶

Best Practices: 1. Use temporal filtering (since parameter) to limit result sets 2. Specify relationship labels to reduce post-processing 3. Index entity nodes for fast relationship endpoint lookups 4. Leverage Neo4j's native graph traversal algorithms

Neo4j Tag Management¶

This document covers tag management operations in the Neo4j repository implementation. Tags are metadata properties attached to entities and edges, implemented as separate nodes in the Neo4j graph database. Tags use the Open Asset Model (OAM) property types to provide structured, type-safe metadata storage.

For entity and edge operations themselves, see Neo4j Entity Operations and Neo4j Edge Operations. For schema initialization including tag node constraints, see Neo4j Schema and Constraints.

Tag Data Model¶

Core Tag Types¶

Tags in Neo4j are represented as separate nodes with relationships to their parent entities or edges. The system supports two tag types:

Tag Type	Purpose	Parent Relationship	Node Label Pattern
`EntityTag`	Properties attached to entities	Links to Entity node via `entity_id`	`:EntityTag:{PropertyType}`
`EdgeTag`	Properties attached to edges	Links to Edge relationship via `edge_id`	`:EdgeTag:{PropertyType}`

Both tag types share a common structure defined in the types package:

EntityTag {
  ID        string         // Unique tag identifier (tag_id)
  CreatedAt time.Time      // Initial creation timestamp
  LastSeen  time.Time      // Most recent update timestamp
  Property  oam.Property   // OAM property instance
  Entity    *types.Entity  // Reference to parent entity
}

EdgeTag {
  ID        string         // Unique tag identifier (tag_id)
  CreatedAt time.Time      // Initial creation timestamp
  LastSeen  time.Time      // Most recent update timestamp
  Property  oam.Property   // OAM property instance
  Edge      *types.Edge    // Reference to parent edge
}

Neo4j Node Structure¶

Tags are stored as nodes with multiple labels for efficient querying:

graph TB
    subgraph "EntityTag Node"
        ET["EntityTag Node<br/>Labels: EntityTag, {PropertyType}<br/><br/>Properties:<br/>tag_id: UUID<br/>entity_id: string<br/>ttype: PropertyType<br/>created_at: LocalDateTime<br/>updated_at: LocalDateTime<br/>+ property-specific fields"]
    end

    subgraph "EdgeTag Node"
        EDT["EdgeTag Node<br/>Labels: EdgeTag, {PropertyType}<br/><br/>Properties:<br/>tag_id: UUID<br/>edge_id: string<br/>ttype: PropertyType<br/>created_at: LocalDateTime<br/>updated_at: LocalDateTime<br/>+ property-specific fields"]
    end

    subgraph "Property Types"
        Simple["SimpleProperty<br/>property_name: string<br/>property_value: string"]
        DNS["DNSRecordProperty<br/>property_name: string<br/>header_rrtype: int<br/>header_class: int<br/>header_ttl: int<br/>data: string"]
        Source["SourceProperty<br/>name: string<br/>confidence: int"]
        Vuln["VulnProperty<br/>vuln_id: string<br/>desc: string<br/>source: string<br/>category: string<br/>enum: string<br/>ref: string"]
    end

    ET -.->|"stores one of"| Simple
    ET -.->|"stores one of"| DNS
    ET -.->|"stores one of"| Source
    ET -.->|"stores one of"| Vuln

    EDT -.->|"stores one of"| Simple
    EDT -.->|"stores one of"| DNS
    EDT -.->|"stores one of"| Source
    EDT -.->|"stores one of"| Vuln

The dual-label system (:EntityTag:{PropertyType}) enables: - Fast filtering by tag type (EntityTag vs EdgeTag) - Type-specific queries without scanning all tags - Efficient property content matching

Entity Tag Operations¶

Tag Creation with Duplicate Handling¶

The CreateEntityTag and CreateEntityProperty functions implement sophisticated duplicate detection and timestamp updating:

sequenceDiagram
    participant Client
    participant Neo as "neoRepository"
    participant DB as "Neo4j Database"

    Client->>Neo: CreateEntityProperty(entity, property)
    Neo->>Neo: Wrap in EntityTag struct

    alt Check for duplicate
        Neo->>DB: FindEntityTagsByContent(property)

        alt Tag exists with same content
            DB-->>Neo: Existing tag found
            Neo->>Neo: Update LastSeen timestamp
            Neo->>Neo: Build props map from tag
            Neo->>DB: MATCH (p:EntityTag) WHERE ...<br/>SET p = $props
            DB-->>Neo: Updated node
            Neo->>Neo: nodeToEntityTag(node)
            Neo-->>Client: Updated EntityTag
        else No duplicate found
            Neo->>Neo: Generate unique tag_id
            Neo->>Neo: Set CreatedAt and LastSeen
            Neo->>Neo: Build props map
            Neo->>DB: CREATE (p:EntityTag:{PropertyType} $props)
            DB-->>Neo: New node created
            Neo->>Neo: nodeToEntityTag(node)
            Neo-->>Client: New EntityTag
        end
    end

Key Implementation Details:

Function	Purpose	Cypher Pattern
`CreateEntityTag`	Main tag creation	`CREATE (p:EntityTag:{PropertyType} $props)`
`CreateEntityProperty`	Convenience wrapper	Delegates to `CreateEntityTag`
`uniqueEntityTagID`	Generate unique UUID	Loop until `FindEntityTagById` fails
`entityTagPropsMap`	Serialize tag to map	Flatten Property fields to node properties

The duplicate detection strategy at ensures: 1. Properties with identical content reuse existing tags 2. LastSeen timestamp is updated on duplicates 3. Property type must match for updates 4. CreatedAt remains unchanged for duplicates

Finding Tags by ID¶

FindEntityTagById retrieves a specific tag using its unique identifier:

Query: MATCH (p:EntityTag {tag_id: $tid}) RETURN p

The function at : 1. Executes parameterized Cypher query with 30-second timeout 2. Returns error if no records found 3. Extracts node from result record 4. Calls nodeToEntityTag for property deserialization

Content-Based Tag Search¶

FindEntityTagsByContent searches for tags matching a specific property value:

graph LR
    Input["oam.Property<br/>+ since timestamp"]
    Query["queryNodeByPropertyKeyValue()"]
    Match["MATCH (p:EntityTag:{Type})<br/>WHERE key='value'<br/>AND updated_at >= since"]
    Results["[]*types.EntityTag"]

    Input --> Query
    Query --> Match
    Match --> Results

The queryNodeByPropertyKeyValue helper function (defined elsewhere in the codebase) constructs type-specific MATCH patterns. For example, a SimpleProperty with name="test" and value="foo" generates:

MATCH (p:EntityTag:SimpleProperty {property_name: 'test', property_value: 'foo'})

Time filtering is applied when since is non-zero:

WHERE p.updated_at >= localDateTime('2025-01-15T10:30:00')

Retrieving All Entity Tags¶

GetEntityTags fetches all tags for a specific entity with optional filtering:

Parameter	Type	Purpose
`entity`	`*types.Entity`	Parent entity to query
`since`	`time.Time`	Filter by update timestamp (ignored if zero)
`names`	`...string`	Optional property names to filter (returns all if empty)

Query Patterns:

# Without time filter
MATCH (p:EntityTag {entity_id: '{entity.ID}'}) RETURN p

# With time filter
MATCH (p:EntityTag {entity_id: '{entity.ID}'})
WHERE p.updated_at >= localDateTime('{since}')
RETURN p

Post-query filtering at applies the names parameter in-memory by comparing Property.Name() against the provided list.

Tag Deletion¶

DeleteEntityTag removes a tag node completely:

MATCH (n:EntityTag {tag_id: $tid})
DETACH DELETE n

The DETACH DELETE ensures any relationships to the tag are also removed, maintaining graph integrity.

Edge Tag Operations¶

Edge tag operations mirror entity tag operations with different node labels and parent relationships. The implementation follows the same patterns but uses :EdgeTag labels and edge_id properties instead of entity_id.

Key Functions:

graph TB
    subgraph "Edge Tag API"
        CreateEdgeTag["CreateEdgeTag(edge, tag)"]
        CreateEdgeProp["CreateEdgeProperty(edge, prop)"]
        FindById["FindEdgeTagById(id)"]
        FindByContent["FindEdgeTagsByContent(prop, since)"]
        GetTags["GetEdgeTags(edge, since, names...)"]
        Delete["DeleteEdgeTag(id)"]
    end

    subgraph "Shared Extraction Logic"
        NodeToTag["nodeToEdgeTag(node)"]
        NodeToProp["nodeToProperty(node, ptype)"]
    end

    CreateEdgeTag --> NodeToTag
    FindById --> NodeToTag
    FindByContent --> NodeToTag
    GetTags --> NodeToTag
    NodeToTag --> NodeToProp

The implementation reuses the same property extraction logic as entity tags, differing only in: - Node label: :EdgeTag instead of :EntityTag - Parent reference field: edge_id instead of entity_id - Return type: *types.EdgeTag instead of *types.EntityTag

Property Extraction System¶

Converting Neo4j Nodes to Tags¶

The extraction functions transform Neo4j nodes back into Go structs:

graph TB
    Node["neo4jdb.Node<br/>Raw Neo4j node"]

    subgraph "Tag Extraction"
        EntityExtract["nodeToEntityTag()"]
        EdgeExtract["nodeToEdgeTag()"]
    end

    subgraph "Property Extraction"
        PropExtract["nodeToProperty(node, ptype)"]
    end

    subgraph "Type-Specific Extraction"
        DNS["nodeToDNSRecordProperty()"]
        Simple["nodeToSimpleProperty()"]
        Source["nodeToSourceProperty()"]
        Vuln["nodeToVulnProperty()"]
    end

    Node --> EntityExtract
    Node --> EdgeExtract

    EntityExtract --> PropExtract
    EdgeExtract --> PropExtract

    PropExtract --> DNS
    PropExtract --> Simple
    PropExtract --> Source
    PropExtract --> Vuln

Extraction Flow:

Read Common Fields - Extract tag_id, entity_id/edge_id, timestamps
Determine Property Type - Read ttype field to identify OAM property type
Dispatch to Type Handler - Call appropriate nodeTo{Type}Property function
Assemble Tag Struct - Combine extracted data into EntityTag or EdgeTag

Property Type Handlers¶

Each OAM property type has a dedicated extraction function:

Property Type	Extractor Function	Key Fields Extracted
`DNSRecordProperty`	`nodeToDNSRecordProperty`	`property_name`, `header_rrtype`, `header_class`, `header_ttl`, `data`
`SimpleProperty`	`nodeToSimpleProperty`	`property_name`, `property_value`
`SourceProperty`	`nodeToSourceProperty`	`name`, `confidence`
`VulnProperty`	`nodeToVulnProperty`	`vuln_id`, `desc`, `source`, `category`, `enum`, `ref`

Example: SimpleProperty Extraction

demonstrates the straightforward extraction:

1. Call neo4jdb.GetProperty[string](node, "property_name")
2. Call neo4jdb.GetProperty[string](node, "property_value")
3. Return &general.SimpleProperty with both fields

Each extractor uses neo4jdb.GetProperty[T] for type-safe property access. Errors propagate upward if required fields are missing.

Time Management¶

Timestamp Fields¶

Tags maintain two timestamps for tracking:

Field	Purpose	Update Strategy
`CreatedAt`	Initial creation time	Set once, never updated
`LastSeen`	Most recent observation	Updated on every duplicate detection

Both timestamps use Neo4j's LocalDateTime type for storage, converted via helper functions:

Go time.Time --> Neo4j LocalDateTime: timeToNeo4jTime(t)
Neo4j LocalDateTime --> Go time.Time: neo4jTimeToTime(t)

Timestamp Behavior in Queries:

# Time filter format
WHERE p.updated_at >= localDateTime('2025-01-15T10:30:00.123456789')

The since parameter in query functions filters tags by updated_at (alias for LastSeen), enabling temporal queries for recently modified tags.

Duplicate Detection Strategy¶

Content Matching Logic¶

The duplicate detection system at prevents redundant tag creation:

graph TD
    Start["CreateEntityTag called"]
    Query["FindEntityTagsByContent(property)"]

    Decision{"Existing tag<br/>found?"}

    TypeCheck{"Property type<br/>matches?"}
    Update["Update LastSeen<br/>Build props map<br/>SET p = $props"]
    Create["Generate new tag_id<br/>Set CreatedAt/LastSeen<br/>CREATE new node"]

    Error["Return error:<br/>type mismatch"]
    Return["Return tag"]

    Start --> Query
    Query --> Decision

    Decision -->|Yes| TypeCheck
    Decision -->|No| Create

    TypeCheck -->|Yes| Update
    TypeCheck -->|No| Error

    Update --> Return
    Create --> Return
    Error --> Return

Property Type Validation:

The check at ensures type consistency:

if input.Property.PropertyType() != t.Property.PropertyType() {
    return error
}

This prevents a SimpleProperty from updating a DNSRecordProperty with the same name, maintaining data integrity.

Test Coverage¶

The integration tests in verify:

Test Case	Coverage
`TestEntityTag`	Full lifecycle: create, find, duplicate handling, timestamp updates, query by name, deletion
`TestEdgeTag`	Parallel verification for edge tags

Key Test Scenarios:

Initial Creation - Verify CreatedAt and LastSeen are set correctly
Duplicate Detection - Second creation with same property updates LastSeen only
Value Change - New property value creates new tag with later CreatedAt
Filtered Retrieval - GetEntityTags with name filter returns correct subset
Deletion - Tag removal and subsequent lookup failure

Neo4j Schema and Constraints¶

This page documents the Neo4j schema initialization system, including uniqueness constraints, range indexes, and database creation. It covers how the Neo4j repository ensures data integrity and query performance through structured schema definitions.

For details on how these constraints are used during entity, edge, and tag operations, see Neo4j Entity Operations, Neo4j Edge Operations, and Neo4j Tag Management. For broader migration system context, see Neo4j Schema Initialization.

Schema Initialization Overview¶

The Neo4j schema is initialized through the InitializeSchema function, which creates the database, establishes constraints, and defines indexes. This initialization occurs automatically when a new Neo4j repository is created via assetdb.New.

Initialization Flow¶

flowchart TD
    Start["InitializeSchema(driver, dbname)"]
    CreateDB["CREATE DATABASE IF NOT EXISTS"]
    StartDB["START DATABASE WAIT 10 SECONDS"]
    CoreConstraints["Apply Core Constraints"]
    CoreIndexes["Apply Core Indexes"]
    ContentConstraints["Apply Content Constraints"]
    Done["Schema Ready"]

    Start --> CreateDB
    CreateDB --> StartDB
    StartDB --> CoreConstraints
    CoreConstraints --> CoreIndexes
    CoreIndexes --> ContentConstraints
    ContentConstraints --> Done

    CoreConstraints --> EntityConstraint["Entity.entity_id UNIQUE"]
    CoreConstraints --> EntityTagConstraint["EntityTag.tag_id UNIQUE"]
    CoreConstraints --> EdgeTagConstraint["EdgeTag.tag_id UNIQUE"]

    CoreIndexes --> EntityEtype["Entity.etype INDEX"]
    CoreIndexes --> EntityUpdated["Entity.updated_at INDEX"]
    CoreIndexes --> TagIndexes["Tag Indexes"]

    ContentConstraints --> AssetConstraints["20+ Asset Type Constraints"]

Core Schema Components¶

The Neo4j schema defines three primary node types with associated constraints and indexes:

Entity Nodes¶

Property	Constraint Type	Purpose
`entity_id`	UNIQUE	Ensures each entity has a unique identifier
`etype`	RANGE INDEX	Enables efficient queries by asset type
`updated_at`	RANGE INDEX	Supports temporal queries and filtering

EntityTag Nodes¶

Property	Constraint Type	Purpose
`tag_id`	UNIQUE	Ensures each tag has a unique identifier
`ttype`	RANGE INDEX	Enables queries by property type
`updated_at`	RANGE INDEX	Supports temporal filtering
`entity_id`	RANGE INDEX	Optimizes tag lookup by entity

EdgeTag Nodes¶

Property	Constraint Type	Purpose
`tag_id`	UNIQUE	Ensures each tag has a unique identifier
`ttype`	RANGE INDEX	Enables queries by property type
`updated_at`	RANGE INDEX	Supports temporal filtering
`edge_id`	RANGE INDEX	Optimizes tag lookup by edge

Constraint Implementation Details¶

The schema uses Cypher's CREATE CONSTRAINT syntax with the IF NOT EXISTS clause to ensure idempotent initialization.

graph TB
    subgraph "Entity Constraint"
        EntityNode["(:Entity)"]
        EntityProp["n.entity_id"]
        EntityConstraint["constraint_entities_entity_id"]

        EntityNode --> EntityProp
        EntityProp --> EntityConstraint
    end

    subgraph "EntityTag Constraint"
        TagNode["(:EntityTag)"]
        TagProp["n.tag_id"]
        TagConstraint["constraint_enttag_tag_id"]

        TagNode --> TagProp
        TagProp --> TagConstraint
    end

    subgraph "EdgeTag Constraint"
        EdgeTagNode["(:EdgeTag)"]
        EdgeTagProp["n.tag_id"]
        EdgeTagConstraint["constraint_edgetag_tag_id"]

        EdgeTagNode --> EdgeTagProp
        EdgeTagProp --> EdgeTagConstraint
    end

executeQuery Helper¶

The executeQuery function provides a wrapper around Neo4j's ExecuteQuery API, abstracting the context and transaction handling:

sequenceDiagram
    participant Init as InitializeSchema
    participant Exec as executeQuery
    participant Driver as neo4jdb.Driver
    participant DB as Neo4j Database

    Init->>Exec: executeQuery(driver, dbname, query)
    Exec->>Driver: ExecuteQuery(context, driver, query)
    Driver->>DB: Execute Cypher query
    DB-->>Driver: Result or error
    Driver-->>Exec: Result or error
    Exec-->>Init: error or nil

Content-Specific Constraints¶

Each Open Asset Model asset type has its own constraint ensuring uniqueness on a natural key property. The entitiesContentIndexes function defines constraints for all 21 supported asset types.

Asset Type Constraint Mapping¶

Asset Type	Node Label	Unique Property	Constraint Name
`Account`	`:Account`	`unique_id`	`constraint_account_content_unique_id`
`AutnumRecord`	`:AutnumRecord`	`handle`, `number`	`constraint_autnum_content_handle`, `constraint_autnum_content_number`
`AutonomousSystem`	`:AutonomousSystem`	`number`	`constraint_autsys_content_number`
`ContactRecord`	`:ContactRecord`	`discovered_at`	`constraint_contact_record_content_discovered_at`
`DomainRecord`	`:DomainRecord`	`domain`	`constraint_domainrec_content_domain`
`File`	`:File`	`url`	`constraint_file_content_url`
`FQDN`	`:FQDN`	`name`	`constraint_fqdn_content_name`
`FundsTransfer`	`:FundsTransfer`	`unique_id`	`constraint_ft_content_unique_id`
`Identifier`	`:Identifier`	`unique_id`	`constraint_identifier_content_unique_id`
`IPAddress`	`:IPAddress`	`address`	`constraint_ipaddr_content_address`
`IPNetRecord`	`:IPNetRecord`	`handle`	`constraint_ipnetrec_content_handle`
`Location`	`:Location`	`address`	`constraint_location_content_name`
`Netblock`	`:Netblock`	`cidr`	`constraint_netblock_content_cidr`
`Organization`	`:Organization`	`unique_id`	`constraint_org_content_id`
`Person`	`:Person`	`unique_id`	`constraint_person_content_id`
`Phone`	`:Phone`	`e164`, `raw`	`constraint_phone_content_e164`, `constraint_phone_content_raw`
`Product`	`:Product`	`unique_id`	`constraint_product_content_id`
`ProductRelease`	`:ProductRelease`	`name`	`constraint_productrelease_content_name`
`Service`	`:Service`	`unique_id`	`constraint_service_content_id`
`TLSCertificate`	`:TLSCertificate`	`serial_number`	`constraint_tls_content_serial_number`
`URL`	`:URL`	`url`	`constraint_url_content_url`

Additional Content Indexes¶

Beyond uniqueness constraints, certain asset types have additional range indexes for frequently queried properties:

Asset Type	Indexed Property	Purpose
`IPNetRecord`	`cidr`	Efficient CIDR range queries
`Organization`	`name`, `legal_name`	Name-based searches
`Person`	`full_name`	Name-based searches
`Product`	`product_name`	Name-based searches

Schema-to-Code Mapping¶

graph LR
    subgraph "Neo4j Schema"
        EntityLabel["(:Entity)"]
        AccountLabel["(:Account)"]
        FQDNLabel["(:FQDN)"]
        IPLabel["(:IPAddress)"]
    end

    subgraph "Go Code Structs"
        EntityStruct["types.Entity"]
        AccountStruct["account.Account"]
        FQDNStruct["dns.FQDN"]
        IPStruct["network.IPAddress"]
    end

    subgraph "Conversion Functions"
        NodeToEntity["nodeToEntity()"]
        NodeToAccount["nodeToAccount()"]
        NodeToFQDN["nodeToFQDN()"]
        NodeToIP["nodeToIPAddress()"]
    end

    EntityLabel --> NodeToEntity
    AccountLabel --> NodeToAccount
    FQDNLabel --> NodeToFQDN
    IPLabel --> NodeToIP

    NodeToEntity --> EntityStruct
    NodeToAccount --> AccountStruct
    NodeToFQDN --> FQDNStruct
    NodeToIP --> IPStruct

Constraint Enforcement¶

Neo4j enforces these constraints at write time, preventing duplicate entities and ensuring data integrity:

sequenceDiagram
    participant Repo as neoRepository
    participant DB as Neo4j Database
    participant Schema as Constraint Engine

    Note over Repo,Schema: Creating Entity with FQDN "example.com"

    Repo->>DB: CREATE (e:Entity {entity_id: uuid})
    DB->>Schema: Check constraint_entities_entity_id
    Schema-->>DB: OK (unique entity_id)

    Repo->>DB: CREATE (f:FQDN {name: "example.com"})
    DB->>Schema: Check constraint_fqdn_content_name

    alt FQDN already exists
        Schema-->>DB: Constraint violation
        DB-->>Repo: Error: "example.com" already exists
    else FQDN is unique
        Schema-->>DB: OK (unique name)
        DB-->>Repo: Node created
    end

Index Types and Performance¶

Neo4j uses range indexes for all non-constraint indexes, enabling efficient range queries, equality checks, and sorting operations.

Index Usage Patterns¶

Index	Query Pattern	Example Cypher
`entities_range_index_etype`	Filter by asset type	`MATCH (e:Entity) WHERE e.etype = 'FQDN'`
`entities_range_index_updated_at`	Temporal queries	`MATCH (e:Entity) WHERE e.updated_at > datetime('2024-01-01')`
`enttag_range_index_entity_id`	Tag lookup	`MATCH (t:EntityTag) WHERE t.entity_id = $id`
`org_range_index_name`	Name searches	`MATCH (o:Organization) WHERE o.name CONTAINS 'Example'`

Database Creation¶

The initialization process begins by ensuring the database exists and is started:

flowchart LR
    CreateCmd["CREATE DATABASE dbname<br/>IF NOT EXISTS"]
    StartCmd["START DATABASE dbname<br/>WAIT 10 SECONDS"]
    Ready["Database Ready"]

    CreateCmd --> StartCmd
    StartCmd --> Ready

These commands are executed with best-effort error handling (errors are ignored) since the database may already exist or be started.

Error Handling¶

The schema initialization uses fail-fast error handling for constraints and indexes. If any constraint or index creation fails, the entire initialization process returns an error, preventing partial schema states.

graph TD
    Start["InitializeSchema()"]
    CreateConstraint["Create Constraint"]
    CheckError{"Error?"}
    NextStep["Next Constraint/Index"]
    ReturnError["return err"]
    Complete["return nil"]

    Start --> CreateConstraint
    CreateConstraint --> CheckError
    CheckError -->|Yes| ReturnError
    CheckError -->|No| NextStep
    NextStep --> CreateConstraint
    NextStep --> Complete

Property Extraction from Nodes¶

The schema's property names directly correspond to the neo4jdb.GetProperty calls in entity extraction functions:

graph TB
    subgraph "FQDN Schema Constraint"
        FQDNNode["(:FQDN)"]
        NameProp["n.name UNIQUE"]
        FQDNNode --> NameProp
    end

    subgraph "nodeToFQDN Function"
        GetProp["neo4jdb.GetProperty[string](node, 'name')"]
        FQDNStruct["&dns.FQDN{Name: name}"]
        GetProp --> FQDNStruct
    end

    NameProp -.->|"Same property name"| GetProp

Multi-Property Constraints¶

Some asset types require multiple unique constraints to ensure different forms of the same data are treated as distinct:

Phone Number Example¶

The Phone asset type has two separate uniqueness constraints: - constraint_phone_content_e164 on the standardized E.164 format - constraint_phone_content_raw on the raw input string

This allows the same phone number to exist in multiple formats while preventing exact duplicates.

AutnumRecord Example¶

The AutnumRecord asset type has two constraints: - constraint_autnum_content_handle on the registry handle - constraint_autnum_content_number on the AS number

This prevents duplicates by both registry identifier and numeric AS value.