Skip to content

API Reference

Top-level imports

shacl_bridges — N-to-m semantic mapping via SHACL shapes with SPARQL CONSTRUCT rules.

Typical usage::

from shacl_bridges.io.yaml_reader import load_mapping
from shacl_bridges.core.graph import select_root_class, check_connectivity
from shacl_bridges.core.shacl import generate_shacl
from shacl_bridges.core.diff import run_bridge_from_files, save_result

mapping = load_mapping("bridge.yaml")
root = select_root_class(mapping.source_pattern.triples, mapping.root_class())

issues = check_connectivity(mapping.source_pattern.triples, root)
if issues:
    raise ValueError(f"Disconnected nodes for root {root!r}: {issues}")

shacl_ttl = generate_shacl(mapping, root)
with open("bridge_shape.ttl", "w") as f:
    f.write(shacl_ttl)

result = run_bridge_from_files("data.ttl", "bridge_shape.ttl")
save_result(result, "expanded.ttl", "diff.ttl")

Or use the CLI::

shacl-bridges validate  bridge.yaml
shacl-bridges diagram   bridge.yaml -o diagram.mmd
shacl-bridges generate  bridge.yaml -o bridge_shape.ttl
shacl-bridges run       bridge.yaml data.ttl

load_mapping(path)

Load a :class:BridgeMapping from a YAML file.

Parameters:

Name Type Description Default
path str | Path

Path to the bridge.yaml file.

required

Returns:

Type Description
BridgeMapping

Parsed and structurally validated :class:BridgeMapping.

Raises:

Type Description
ValueError

If required keys are missing or triples are malformed.

FileNotFoundError

If path does not exist.

select_root_class(triples, explicit_root=None)

Return the CURIE of the class that should anchor the SHACL shape.

Selection order: 1. explicit_root if provided (comes from source_pattern.root in the YAML). 2. The node with the highest closeness centrality in the undirected view of the source-pattern graph; ties broken by out-degree in the directed view.

Parameters:

Name Type Description Default
triples list[Triple]

Source-pattern triples (source_pattern.triples).

required
explicit_root str | None

CURIE string supplied by the user, or None.

None

Returns:

Type Description
str

CURIE string of the selected root class.

Raises:

Type Description
ValueError

If triples is empty.

check_connectivity(triples, root)

Return a list of nodes NOT reachable from root in the source-pattern graph.

A non-empty result means the WHERE clause would contain disconnected sub-patterns, causing the SPARQL CONSTRUCT to over-match.

Parameters:

Name Type Description Default
triples list[Triple]

Source-pattern triples.

required
root str

CURIE of the chosen root class.

required

Returns:

Type Description
list[str]

Sorted list of unreachable node CURIEs (empty if fully connected).

generate_shacl(mapping, root_class, shape_name='shapes:BridgeShape')

Generate a complete SHACL Turtle document for the given mapping.

Parameters:

Name Type Description Default
mapping BridgeMapping

Loaded :class:~shacl_bridges.io.yaml_reader.BridgeMapping.

required
root_class str

CURIE of the class that the shape targets (sh:targetClass).

required
shape_name str

Local name for the generated sh:NodeShape.

'shapes:BridgeShape'

Returns:

Type Description
str

Full Turtle string ready to be written to a .ttl file.

run_bridge(data_graph, shacl_graph, inference='rdfs')

Apply a SHACL bridge shape to data_graph and return the result.

Parameters:

Name Type Description Default
data_graph Graph

The instance data to transform.

required
shacl_graph Graph

The generated SHACL shape (containing the SPARQLRule).

required
inference str

Reasoner to apply before validation. "rdfs" is the default and sufficient for most harmonization needs. Pass "none" to disable inference entirely.

'rdfs'

Returns:

Name Type Description
A BridgeResult

class:BridgeResult with expanded graph, diff, and report.

run_bridge_from_files(data_path, shacl_path, inference='rdfs')

Convenience wrapper: load graphs from file paths, then call :func:run_bridge.

Parameters:

Name Type Description Default
data_path str | Path

Path to the instance data Turtle file.

required
shacl_path str | Path

Path to the SHACL shape Turtle file.

required
inference str

Reasoner to apply.

'rdfs'

Returns:

Name Type Description
A BridgeResult

class:BridgeResult.

save_result(result, expanded_path, diff_path)

Serialize expanded and diff graphs to Turtle files.

Parameters:

Name Type Description Default
result BridgeResult

Output of :func:run_bridge.

required
expanded_path str | Path

Destination path for the expanded graph.

required
diff_path str | Path

Destination path for the diff graph.

required

harmonize_to_turtle(source, destination=None, fmt=None)

Load source in any RDF serialization and re-serialize as Turtle.

This normalizes syntax differences between tools (Protégé RDF/XML, robot OWL/XML, hand-written Turtle, etc.) before the bridge pipeline runs. No inference is applied.

Parameters:

Name Type Description Default
source str | Path

Input RDF file (any serialization).

required
destination str | Path | None

Output .ttl path. When None the file is written alongside source with a .ttl suffix replacing the original extension.

None
fmt str | None

Force an input format string instead of auto-detecting.

None

Returns:

Type Description
Graph

The loaded :class:rdflib.Graph (the in-memory representation after

Graph

round-tripping through rdflib's parser/serializer).


shacl_bridges.io.yaml_reader

shacl_bridges.io.yaml_reader

YAML-based bridge mapping loader.

A mapping is defined in a single YAML file (conventionally named bridge.yaml) with five top-level sections:

  • metadata — title, version, creator, license, default justification
  • prefixes — namespace declarations (prefix → IRI)
  • source_pattern — S-P-O triples defining the source design pattern; optional root override
  • target_pattern — S-P-O triples defining the target design pattern
  • class_map — explicit alignment between source and target classes

See docs/yaml_format.md for the full schema and annotated example.

Triple = tuple[str, str, str] module-attribute

A (subject, predicate, object) triple where all three are CURIE strings.

BridgeMapping dataclass

All information that defines one bridge mapping.

Load with :func:load_mapping. Validate with :func:~shacl_bridges.validate.validate_mapping.

Source code in shacl_bridges/io/yaml_reader.py
@dataclass
class BridgeMapping:
    """All information that defines one bridge mapping.

    Load with :func:`load_mapping`. Validate with
    :func:`~shacl_bridges.validate.validate_mapping`.
    """

    prefixes: dict[str, str]
    """Namespace declarations: ``{prefix: IRI}``."""

    source_pattern: SourcePattern
    """The source design pattern with its triples and optional root override."""

    target_pattern: TargetPattern
    """The target design pattern triples."""

    class_map: list[ClassMapEntry]
    """Alignment between source and target classes."""

    metadata: Metadata = field(default_factory=Metadata)
    """Title, version, creator, license, default justification."""

    # ------------------------------------------------------------------
    # Convenience accessors
    # ------------------------------------------------------------------

    def prefix_map(self) -> dict[str, str]:
        """Return ``{prefix: namespace}`` dict for SHACL/SPARQL generation."""
        return dict(self.prefixes)

    def root_class(self) -> str | None:
        """Return the explicitly declared root class CURIE, or *None*."""
        return self.source_pattern.root

    def class_alignment(self) -> dict[str, str]:
        """Return ``{source_curie: target_curie}`` for **regular** (non-derived) entries.

        Entries with a ``derived_iri`` represent *new* instances minted at query
        time and are intentionally excluded here — they are accessed via
        :meth:`derived_class_map` and handled separately by the SPARQL builder.

        When the same source class appears in both a regular and a derived entry
        the regular (non-derived) entry wins and sets the primary ``?this``
        target type.
        """
        result: dict[str, str] = {}
        for e in self.class_map:
            if e.derived_iri is None and e.source not in result:
                result[e.source] = e.target
        return result

    def derived_class_map(self) -> list[ClassMapEntry]:
        """Return only the entries that carry a ``derived_iri`` (instance-split targets)."""
        return [e for e in self.class_map if e.derived_iri is not None]

    def source_classes(self) -> set[str]:
        """Return the set of source CURIEs declared in the class map."""
        return {e.source for e in self.class_map}

    def target_classes(self) -> set[str]:
        """Return the set of target CURIEs declared in the class map (regular + derived)."""
        return {e.target for e in self.class_map}

class_map instance-attribute

Alignment between source and target classes.

metadata = field(default_factory=Metadata) class-attribute instance-attribute

Title, version, creator, license, default justification.

prefixes instance-attribute

Namespace declarations: {prefix: IRI}.

source_pattern instance-attribute

The source design pattern with its triples and optional root override.

target_pattern instance-attribute

The target design pattern triples.

class_alignment()

Return {source_curie: target_curie} for regular (non-derived) entries.

Entries with a derived_iri represent new instances minted at query time and are intentionally excluded here — they are accessed via :meth:derived_class_map and handled separately by the SPARQL builder.

When the same source class appears in both a regular and a derived entry the regular (non-derived) entry wins and sets the primary ?this target type.

Source code in shacl_bridges/io/yaml_reader.py
def class_alignment(self) -> dict[str, str]:
    """Return ``{source_curie: target_curie}`` for **regular** (non-derived) entries.

    Entries with a ``derived_iri`` represent *new* instances minted at query
    time and are intentionally excluded here — they are accessed via
    :meth:`derived_class_map` and handled separately by the SPARQL builder.

    When the same source class appears in both a regular and a derived entry
    the regular (non-derived) entry wins and sets the primary ``?this``
    target type.
    """
    result: dict[str, str] = {}
    for e in self.class_map:
        if e.derived_iri is None and e.source not in result:
            result[e.source] = e.target
    return result

derived_class_map()

Return only the entries that carry a derived_iri (instance-split targets).

Source code in shacl_bridges/io/yaml_reader.py
def derived_class_map(self) -> list[ClassMapEntry]:
    """Return only the entries that carry a ``derived_iri`` (instance-split targets)."""
    return [e for e in self.class_map if e.derived_iri is not None]

prefix_map()

Return {prefix: namespace} dict for SHACL/SPARQL generation.

Source code in shacl_bridges/io/yaml_reader.py
def prefix_map(self) -> dict[str, str]:
    """Return ``{prefix: namespace}`` dict for SHACL/SPARQL generation."""
    return dict(self.prefixes)

root_class()

Return the explicitly declared root class CURIE, or None.

Source code in shacl_bridges/io/yaml_reader.py
def root_class(self) -> str | None:
    """Return the explicitly declared root class CURIE, or *None*."""
    return self.source_pattern.root

source_classes()

Return the set of source CURIEs declared in the class map.

Source code in shacl_bridges/io/yaml_reader.py
def source_classes(self) -> set[str]:
    """Return the set of source CURIEs declared in the class map."""
    return {e.source for e in self.class_map}

target_classes()

Return the set of target CURIEs declared in the class map (regular + derived).

Source code in shacl_bridges/io/yaml_reader.py
def target_classes(self) -> set[str]:
    """Return the set of target CURIEs declared in the class map (regular + derived)."""
    return {e.target for e in self.class_map}

ClassMapEntry dataclass

A single source-class → target-class alignment.

For a standard 1-to-1 mapping leave derived_iri as None.

For instance-split mappings — where one source instance must become two target instances (e.g. a conflated "Agent+Role" class splitting into a separate Agent and AgentRole) — add a second entry for the same source class with derived_iri set. The tool will mint a new IRI for the derived instance at query time.

Supported derived_iri forms:

  • "suffix:<string>" — append to the source instance IRI. Example: suffix:_role turns ex:agent1 into ex:agent1_role.
Source code in shacl_bridges/io/yaml_reader.py
@dataclass
class ClassMapEntry:
    """A single source-class → target-class alignment.

    For a standard 1-to-1 mapping leave *derived_iri* as *None*.

    For **instance-split** mappings — where one source instance must become two
    target instances (e.g. a conflated "Agent+Role" class splitting into a
    separate ``Agent`` and ``AgentRole``) — add a second entry for the same
    *source* class with ``derived_iri`` set.  The tool will mint a new IRI for
    the derived instance at query time.

    Supported ``derived_iri`` forms:

    * ``"suffix:<string>"`` — append *<string>* to the source instance IRI.
      Example: ``suffix:_role`` turns ``ex:agent1`` into ``ex:agent1_role``.
    """

    source: str
    """CURIE of the source class (must appear in ``source_pattern.triples``)."""

    target: str
    """CURIE of the target class (must appear in ``target_pattern.triples``)."""

    justification: str | None = None
    """SSSOM-style justification CURIE, e.g. ``semapv:ManualMappingCuration``."""

    comment: str | None = None
    """Human-readable explanation of why this mapping is valid."""

    derived_iri: str | None = None
    """IRI minting rule for instance-split targets (see class docstring).
    When *None* the target instance reuses the source instance IRI (standard case)."""

comment = None class-attribute instance-attribute

Human-readable explanation of why this mapping is valid.

derived_iri = None class-attribute instance-attribute

IRI minting rule for instance-split targets (see class docstring). When None the target instance reuses the source instance IRI (standard case).

justification = None class-attribute instance-attribute

SSSOM-style justification CURIE, e.g. semapv:ManualMappingCuration.

source instance-attribute

CURIE of the source class (must appear in source_pattern.triples).

target instance-attribute

CURIE of the target class (must appear in target_pattern.triples).

Metadata dataclass

Human-readable metadata about the bridge mapping.

Source code in shacl_bridges/io/yaml_reader.py
@dataclass
class Metadata:
    """Human-readable metadata about the bridge mapping."""

    title: str = ""
    version: str = "0.1.0"
    creator: str = ""
    license: str = ""
    mapping_justification: str = "semapv:ManualMappingCuration"
    """Default justification applied to all class-map entries that don't override it."""

mapping_justification = 'semapv:ManualMappingCuration' class-attribute instance-attribute

Default justification applied to all class-map entries that don't override it.

SourcePattern dataclass

The source design pattern: a list of S-P-O triples and an optional root override.

Source code in shacl_bridges/io/yaml_reader.py
@dataclass
class SourcePattern:
    """The source design pattern: a list of S-P-O triples and an optional root override."""

    triples: list[Triple]
    """All triples (core structural + peripheral validation) of the source pattern."""

    root: str | None = None
    """CURIE of the class that should anchor ``sh:targetClass`` and ``?this``.
    When *None* the root is computed automatically via closeness centrality."""

root = None class-attribute instance-attribute

CURIE of the class that should anchor sh:targetClass and ?this. When None the root is computed automatically via closeness centrality.

triples instance-attribute

All triples (core structural + peripheral validation) of the source pattern.

TargetPattern dataclass

The target design pattern: the triples that the bridge CONSTRUCT will produce.

Source code in shacl_bridges/io/yaml_reader.py
@dataclass
class TargetPattern:
    """The target design pattern: the triples that the bridge CONSTRUCT will produce."""

    triples: list[Triple]

load_mapping(path)

Load a :class:BridgeMapping from a YAML file.

Parameters:

Name Type Description Default
path str | Path

Path to the bridge.yaml file.

required

Returns:

Type Description
BridgeMapping

Parsed and structurally validated :class:BridgeMapping.

Raises:

Type Description
ValueError

If required keys are missing or triples are malformed.

FileNotFoundError

If path does not exist.

Source code in shacl_bridges/io/yaml_reader.py
def load_mapping(path: str | Path) -> BridgeMapping:
    """Load a :class:`BridgeMapping` from a YAML file.

    Args:
        path: Path to the ``bridge.yaml`` file.

    Returns:
        Parsed and structurally validated :class:`BridgeMapping`.

    Raises:
        ValueError: If required keys are missing or triples are malformed.
        FileNotFoundError: If *path* does not exist.
    """
    path = Path(path)
    with path.open(encoding="utf-8") as f:
        data = yaml.safe_load(f)

    if not isinstance(data, dict):
        raise ValueError(
            f"{path}: YAML root must be a mapping, got {type(data).__name__}"
        )

    # metadata (optional)
    m = data.get("metadata", {}) or {}
    metadata = Metadata(
        title=str(m.get("title", "")),
        version=str(m.get("version", "0.1.0")),
        creator=str(m.get("creator", "")),
        license=str(m.get("license", "")),
        mapping_justification=str(
            m.get("mapping_justification", "semapv:ManualMappingCuration")
        ),
    )

    # prefixes (required)
    raw_prefixes = _require(data, "prefixes")
    if not isinstance(raw_prefixes, dict):
        raise ValueError("'prefixes' must be a mapping of prefix → namespace IRI")
    prefixes = {str(k): str(v) for k, v in raw_prefixes.items()}

    # source_pattern (required)
    sp_raw = _require(data, "source_pattern")
    source_triples_raw = _require(sp_raw, "triples", "source_pattern")
    source_pattern = SourcePattern(
        root=sp_raw.get("root"),
        triples=[_parse_triple(t) for t in source_triples_raw],
    )

    # target_pattern (required)
    tp_raw = _require(data, "target_pattern")
    target_triples_raw = _require(tp_raw, "triples", "target_pattern")
    target_pattern = TargetPattern(
        triples=[_parse_triple(t) for t in target_triples_raw],
    )

    # class_map (required)
    cm_raw = _require(data, "class_map")
    if not isinstance(cm_raw, list):
        raise ValueError("'class_map' must be a list of {source, target, ...} entries")
    class_map: list[ClassMapEntry] = []
    for i, entry in enumerate(cm_raw):
        if not isinstance(entry, dict):
            raise ValueError(
                f"class_map[{i}] must be a mapping, got {type(entry).__name__}"
            )
        derived_iri_raw = entry.get("derived_iri")
        if derived_iri_raw is not None:
            derived_iri_raw = str(derived_iri_raw)
            if not derived_iri_raw.startswith("suffix:"):
                raise ValueError(
                    f"class_map[{i}].derived_iri: unsupported form {derived_iri_raw!r}."
                    " Currently supported: 'suffix:<string>'"
                )
        class_map.append(ClassMapEntry(
            source=str(_require(entry, "source", f"class_map[{i}]")),
            target=str(_require(entry, "target", f"class_map[{i}]")),
            justification=entry.get("justification"),
            comment=entry.get("comment"),
            derived_iri=derived_iri_raw,
        ))

    return BridgeMapping(
        metadata=metadata,
        prefixes=prefixes,
        source_pattern=source_pattern,
        target_pattern=target_pattern,
        class_map=class_map,
    )

shacl_bridges.io.rdf_utils

shacl_bridges.io.rdf_utils

Utilities for loading and normalizing RDF graphs.

The primary purpose here is syntax harmonization: converting any RDF serialization (RDF/XML, OWL/XML, Turtle, N-Triples, JSON-LD, etc.) to a canonical Turtle representation before the bridge pipeline runs. This avoids blank-node ID collisions and namespace prefix inconsistencies that arise when mixing serialization styles from tools like Protégé, OWLTools, or robot.

No semantic inference is performed here. That belongs in core/diff.py via pyshacl.

harmonize_many(sources, output_dir=None, fmt=None)

Harmonize multiple RDF files to Turtle in one call.

Parameters:

Name Type Description Default
sources list[str | Path]

List of input file paths.

required
output_dir str | Path | None

Directory for output files. When None each file is written next to its source.

None
fmt str | None

Force an input format for all files.

None

Returns:

Type Description
dict[Path, Graph]

Mapping from output path to loaded :class:rdflib.Graph.

Source code in shacl_bridges/io/rdf_utils.py
def harmonize_many(
    sources: list[str | Path],
    output_dir: str | Path | None = None,
    fmt: str | None = None,
) -> dict[Path, Graph]:
    """Harmonize multiple RDF files to Turtle in one call.

    Args:
        sources: List of input file paths.
        output_dir: Directory for output files. When *None* each file is written
                    next to its source.
        fmt: Force an input format for all files.

    Returns:
        Mapping from output path to loaded :class:`rdflib.Graph`.
    """
    results: dict[Path, Graph] = {}
    for src in sources:
        src = Path(src)
        if output_dir is not None:
            dest = Path(output_dir) / src.with_suffix(".ttl").name
        else:
            dest = None
        g = harmonize_to_turtle(src, destination=dest, fmt=fmt)
        out_path = dest if dest is not None else src.with_suffix(".ttl")
        results[out_path] = g
    return results

harmonize_to_turtle(source, destination=None, fmt=None)

Load source in any RDF serialization and re-serialize as Turtle.

This normalizes syntax differences between tools (Protégé RDF/XML, robot OWL/XML, hand-written Turtle, etc.) before the bridge pipeline runs. No inference is applied.

Parameters:

Name Type Description Default
source str | Path

Input RDF file (any serialization).

required
destination str | Path | None

Output .ttl path. When None the file is written alongside source with a .ttl suffix replacing the original extension.

None
fmt str | None

Force an input format string instead of auto-detecting.

None

Returns:

Type Description
Graph

The loaded :class:rdflib.Graph (the in-memory representation after

Graph

round-tripping through rdflib's parser/serializer).

Source code in shacl_bridges/io/rdf_utils.py
def harmonize_to_turtle(
    source: str | Path,
    destination: str | Path | None = None,
    fmt: str | None = None,
) -> Graph:
    """Load *source* in any RDF serialization and re-serialize as Turtle.

    This normalizes syntax differences between tools (Protégé RDF/XML,
    robot OWL/XML, hand-written Turtle, etc.) before the bridge pipeline runs.
    No inference is applied.

    Args:
        source: Input RDF file (any serialization).
        destination: Output ``.ttl`` path. When *None* the file is written
                     alongside *source* with a ``.ttl`` suffix replacing the
                     original extension.
        fmt: Force an input format string instead of auto-detecting.

    Returns:
        The loaded :class:`rdflib.Graph` (the in-memory representation after
        round-tripping through rdflib's parser/serializer).
    """
    source = Path(source)
    g = load_graph(source, fmt=fmt)

    if destination is None:
        destination = source.with_suffix(".ttl")
    destination = Path(destination)

    g.serialize(destination=str(destination), format="turtle")
    return g

load_graph(source, fmt=None)

Load an RDF graph from source, auto-detecting serialization if fmt is None.

Parameters:

Name Type Description Default
source str | Path

File path or URL.

required
fmt str | None

Explicit rdflib format string (e.g. "xml", "turtle"). When None the format is guessed from the file extension.

None

Returns:

Type Description
Graph

A parsed :class:rdflib.Graph.

Source code in shacl_bridges/io/rdf_utils.py
def load_graph(source: str | Path, fmt: str | None = None) -> Graph:
    """Load an RDF graph from *source*, auto-detecting serialization if *fmt* is None.

    Args:
        source: File path or URL.
        fmt: Explicit rdflib format string (e.g. ``"xml"``, ``"turtle"``).
             When *None* the format is guessed from the file extension.

    Returns:
        A parsed :class:`rdflib.Graph`.
    """
    path = Path(source)
    resolved_fmt = fmt or _guess_format(path)
    g = Graph()
    g.parse(str(path), format=resolved_fmt)
    return g

shacl_bridges.core.graph

shacl_bridges.core.graph

Graph analysis utilities used to select the root (target) class for SHACL generation.

The root class becomes sh:targetClass in the generated NodeShape and anchors the ?this variable in the SPARQL CONSTRUCT WHERE clause. Choosing the wrong root produces a disconnected WHERE pattern that over-matches — the most common source of incorrect bridge output.

Two mechanisms are provided: 1. Explicit override via source_pattern.root in the bridge YAML. 2. Automatic selection using closeness centrality on the source-pattern graph, with out-degree as a tiebreaker.

build_validation_graph(triples)

Build a directed graph from a list of S-P-O triples.

Each triple (subject, predicate, object) becomes a directed edge subject → object labelled with the predicate.

Parameters:

Name Type Description Default
triples list[Triple]

List of (subject_curie, predicate_curie, object_curie) tuples.

required

Returns:

Type Description
DiGraph

Directed graph with predicate edge attribute.

Source code in shacl_bridges/core/graph.py
def build_validation_graph(triples: list[Triple]) -> nx.DiGraph:
    """Build a directed graph from a list of S-P-O triples.

    Each triple ``(subject, predicate, object)`` becomes a directed edge
    ``subject → object`` labelled with the predicate.

    Args:
        triples: List of ``(subject_curie, predicate_curie, object_curie)`` tuples.

    Returns:
        Directed graph with ``predicate`` edge attribute.
    """
    G = nx.DiGraph()
    for s, p, o in triples:
        G.add_edge(s, o, predicate=p)
    return G

check_connectivity(triples, root)

Return a list of nodes NOT reachable from root in the source-pattern graph.

A non-empty result means the WHERE clause would contain disconnected sub-patterns, causing the SPARQL CONSTRUCT to over-match.

Parameters:

Name Type Description Default
triples list[Triple]

Source-pattern triples.

required
root str

CURIE of the chosen root class.

required

Returns:

Type Description
list[str]

Sorted list of unreachable node CURIEs (empty if fully connected).

Source code in shacl_bridges/core/graph.py
def check_connectivity(
    triples: list[Triple],
    root: str,
) -> list[str]:
    """Return a list of nodes NOT reachable from *root* in the source-pattern graph.

    A non-empty result means the WHERE clause would contain disconnected
    sub-patterns, causing the SPARQL CONSTRUCT to over-match.

    Args:
        triples: Source-pattern triples.
        root: CURIE of the chosen root class.

    Returns:
        Sorted list of unreachable node CURIEs (empty if fully connected).
    """
    G = build_validation_graph(triples)
    reachable = nx.node_connected_component(G.to_undirected(), root)
    all_nodes = set(G.nodes())
    return sorted(all_nodes - reachable)

longest_path_length(G)

Return the number of edges on the longest path in a DAG.

Parameters:

Name Type Description Default
G DiGraph

A directed acyclic graph.

required

Returns:

Type Description
int

Number of edges (0 for a single-node graph with no edges).

Raises:

Type Description
ValueError

If G is not a DAG.

Source code in shacl_bridges/core/graph.py
def longest_path_length(G: nx.DiGraph) -> int:
    """Return the number of *edges* on the longest path in a DAG.

    Args:
        G: A directed acyclic graph.

    Returns:
        Number of edges (0 for a single-node graph with no edges).

    Raises:
        ValueError: If *G* is not a DAG.
    """
    if not nx.is_directed_acyclic_graph(G):
        raise ValueError("Graph is not a DAG; longest path is undefined.")
    path = nx.dag_longest_path(G)
    return max(0, len(path) - 1)

select_root_class(triples, explicit_root=None)

Return the CURIE of the class that should anchor the SHACL shape.

Selection order: 1. explicit_root if provided (comes from source_pattern.root in the YAML). 2. The node with the highest closeness centrality in the undirected view of the source-pattern graph; ties broken by out-degree in the directed view.

Parameters:

Name Type Description Default
triples list[Triple]

Source-pattern triples (source_pattern.triples).

required
explicit_root str | None

CURIE string supplied by the user, or None.

None

Returns:

Type Description
str

CURIE string of the selected root class.

Raises:

Type Description
ValueError

If triples is empty.

Source code in shacl_bridges/core/graph.py
def select_root_class(
    triples: list[Triple],
    explicit_root: str | None = None,
) -> str:
    """Return the CURIE of the class that should anchor the SHACL shape.

    Selection order:
    1. *explicit_root* if provided (comes from ``source_pattern.root`` in the YAML).
    2. The node with the highest closeness centrality in the undirected view of
       the source-pattern graph; ties broken by out-degree in the directed view.

    Args:
        triples: Source-pattern triples (``source_pattern.triples``).
        explicit_root: CURIE string supplied by the user, or *None*.

    Returns:
        CURIE string of the selected root class.

    Raises:
        ValueError: If *triples* is empty.
    """
    if explicit_root:
        return explicit_root

    G_directed = build_validation_graph(triples)

    if G_directed.number_of_nodes() == 0:
        raise ValueError("source_pattern has no triples; cannot select a root class.")

    # Undirected copy with asymmetric weights so that traversing "against" an
    # edge is penalised — nodes reachable via outgoing edges are preferred.
    G_undirected = nx.Graph()
    for u, v in G_directed.edges():
        G_undirected.add_edge(u, v, weight=1)
        G_undirected.add_edge(v, u, weight=2)

    centrality = nx.closeness_centrality(G_undirected, distance="weight")
    max_val = max(centrality.values())
    candidates = [n for n, c in centrality.items() if c == max_val]

    if len(candidates) == 1:
        return candidates[0]

    # Tiebreak: highest out-degree in the directed graph
    return max(candidates, key=lambda n: G_directed.out_degree(n))

shacl_bridges.core.sparql

shacl_bridges.core.sparql

SPARQL CONSTRUCT query generation.

Generates the CONSTRUCT { ... } WHERE { ... } block that is embedded inside a sh:SPARQLRule. The WHERE clause is always anchored to ?this (the SHACL convention for the focused node), which guarantees that only subgraphs that are fully connected to the root class are matched.

Variable naming
  • ?this — the focused node (bound to the root class by SHACL)
  • ?var_<suffix> — auto-generated variables for all other nodes, where suffix is a letter sequence (a, b, c, … z, aa, ab, …)

build_sparql_construct(class_alignment, source_triples, target_triples, root_class, prefix_map, derived_entries=None)

Generate a SPARQL CONSTRUCT query from the bridge mapping.

The WHERE clause: - Binds ?this to the root_class (?this rdf:type <root_class>) - Includes only core source triples — those where both the subject and object are source classes present in class_alignment. Peripheral/upper-level triples (e.g. ex:Process isSome ex:ChemicalInvestigation) exist only at the TBox level and are omitted from the SPARQL pattern. - Every core triple produces type assertions for the non-root nodes. - For each derived_entry a BIND(IRI(CONCAT(STR(?this), "…")) AS ?derived_X) line is appended to mint a fresh IRI for the split-off instance.

The CONSTRUCT clause: - Asserts new rdf:type triples for each source → target class mapping - Asserts new rdf:type triples for each derived entry (instance split) - Asserts the target-pattern relation triples, with variables resolved via the reverse of class_alignment and the derived variable map

Parameters:

Name Type Description Default
class_alignment dict[str, str]

{source_curie: target_curie} from the regular (non-derived) class-map entries.

required
source_triples list[Triple]

All triples from source_pattern.triples.

required
target_triples list[Triple]

All triples from target_pattern.triples.

required
root_class str

CURIE of the root class (the sh:targetClass).

required
prefix_map dict[str, str]

{prefix: namespace} dict (used for context; not embedded here).

required
derived_entries list[ClassMapEntry] | None

Entries with a derived_iri field — each describes one instance to be split off from the source and assigned a minted IRI.

None

Returns:

Type Description
str

SPARQL CONSTRUCT string (without PREFIX declarations — those are

str

emitted separately as sh:prefixes blocks in the SHACL shape).

Source code in shacl_bridges/core/sparql.py
def build_sparql_construct(
    class_alignment: dict[str, str],
    source_triples: list[Triple],
    target_triples: list[Triple],
    root_class: str,
    prefix_map: dict[str, str],
    derived_entries: list[ClassMapEntry] | None = None,
) -> str:
    """Generate a SPARQL CONSTRUCT query from the bridge mapping.

    The WHERE clause:
    - Binds ``?this`` to the *root_class* (``?this rdf:type <root_class>``)
    - Includes only *core* source triples — those where both the subject and object
      are source classes present in *class_alignment*. Peripheral/upper-level triples
      (e.g. ``ex:Process isSome ex:ChemicalInvestigation``) exist only at the TBox
      level and are omitted from the SPARQL pattern.
    - Every core triple produces type assertions for the non-root nodes.
    - For each *derived_entry* a ``BIND(IRI(CONCAT(STR(?this), "…")) AS ?derived_X)``
      line is appended to mint a fresh IRI for the split-off instance.

    The CONSTRUCT clause:
    - Asserts new ``rdf:type`` triples for each source → target class mapping
    - Asserts new ``rdf:type`` triples for each derived entry (instance split)
    - Asserts the target-pattern relation triples, with variables resolved via the
      reverse of *class_alignment* and the derived variable map

    Args:
        class_alignment: ``{source_curie: target_curie}`` from the **regular**
            (non-derived) class-map entries.
        source_triples: All triples from ``source_pattern.triples``.
        target_triples: All triples from ``target_pattern.triples``.
        root_class: CURIE of the root class (the ``sh:targetClass``).
        prefix_map: ``{prefix: namespace}`` dict (used for context; not embedded here).
        derived_entries: Entries with a ``derived_iri`` field — each describes one
            instance to be split off from the source and assigned a minted IRI.

    Returns:
        SPARQL CONSTRUCT string (without ``PREFIX`` declarations — those are
        emitted separately as ``sh:prefixes`` blocks in the SHACL shape).
    """
    derived_entries = derived_entries or []
    source_classes = set(class_alignment.keys())

    # Collect all source-side entities in a stable order so variable
    # assignment is deterministic. Class-map sources come first so they
    # always get the lowest-suffix variables.
    all_source_entities: list[str] = []
    seen: set[str] = set()

    for src in class_alignment:
        if src not in seen:
            all_source_entities.append(src)
            seen.add(src)
    for s, _p, o in source_triples:
        for v in (s, o):
            if v not in seen:
                all_source_entities.append(v)
                seen.add(v)

    var_map = _generate_variable_names(all_source_entities)
    var_map[root_class] = "?this"  # root class always binds to ?this

    # ------------------------------------------------------------------
    # Derived-entry variable map
    # Maps each derived target CURIE to a fresh SPARQL variable name.
    # Variable name: ?derived_<LocalName> where LocalName is the part
    # after the last ":" in the CURIE.
    # ------------------------------------------------------------------
    derived_var_map: dict[str, str] = {}
    for entry in derived_entries:
        local = entry.target.split(":")[-1]
        derived_var_map[entry.target] = f"?derived_{local}"

    # ------------------------------------------------------------------
    # WHERE clause
    # ------------------------------------------------------------------
    # Only include source triples where BOTH subject and object are source
    # classes in the class_alignment. Upper-level / taxonomic triples are
    # excluded — they apply only at the TBox level, not in instance data.
    where_lines: list[str] = [f"  ?this rdf:type {root_class} ."]

    for s, p, o in source_triples:
        if s in source_classes and o in source_classes:
            s_var = var_map.get(s, f"?{s}")
            o_var = var_map.get(o, f"?{o}")
            where_lines.append(f"  {s_var} {p} {o_var} .")
            if s != root_class:
                where_lines.append(f"  {s_var} rdf:type {s} .")
            where_lines.append(f"  {o_var} rdf:type {o} .")

    where_lines = list(dict.fromkeys(where_lines))  # deduplicate preserving order

    # BIND lines for derived (instance-split) entries come after pattern triples.
    for entry in derived_entries:
        suffix = entry.derived_iri[len("suffix:"):]   # strip "suffix:" prefix
        var_name = derived_var_map[entry.target]
        where_lines.append(
            f'  BIND(IRI(CONCAT(STR(?this), "{suffix}")) AS {var_name})'
        )

    # ------------------------------------------------------------------
    # CONSTRUCT clause
    # ------------------------------------------------------------------
    construct_lines: list[str] = []
    seen_construct: set[str] = set()

    # 1. rdf:type assertions: each source instance is also asserted as its target type.
    # Blank-node targets (``_:label``) are skipped — blank nodes have no fixed rdf:type.
    for src, tgt in class_alignment.items():
        if tgt.startswith("_:"):
            continue
        src_var = var_map.get(src, f"?{src}")
        line = f"  {src_var} rdf:type {tgt} ."
        if line not in seen_construct:
            construct_lines.append(line)
            seen_construct.add(line)

    # 2. rdf:type assertions for derived (instance-split) targets.
    for entry in derived_entries:
        var_name = derived_var_map[entry.target]
        line = f"  {var_name} rdf:type {entry.target} ."
        if line not in seen_construct:
            construct_lines.append(line)
            seen_construct.add(line)

    # 3. Target-pattern relation triples.
    # Resolve target classes back to their source variables via the reverse map,
    # falling back to derived_var_map for split-off nodes.
    # Blank-node labels (``_:label``) pass through verbatim — SPARQL CONSTRUCT
    # creates a fresh blank node for each solution row.
    rev_alignment = {tgt: src for src, tgt in class_alignment.items()}

    def _resolve_target_node(node: str) -> str:
        """Return the SPARQL term for a target-pattern node."""
        if node.startswith("_:"):
            return node  # blank node label — kept as-is in CONSTRUCT
        if node in derived_var_map:
            return derived_var_map[node]  # derived / minted instance
        src = rev_alignment.get(node)
        return var_map.get(src) if src else f"<{node}>"

    for s, p, o in target_triples:
        s_var = _resolve_target_node(s)
        o_var = _resolve_target_node(o)
        line = f"  {s_var} {p} {o_var} ."
        if line not in seen_construct:
            construct_lines.append(line)
            seen_construct.add(line)

    construct_block = "\n".join(construct_lines)
    where_block = "\n".join(where_lines)

    return f"CONSTRUCT {{\n{construct_block}\n}}\nWHERE {{\n{where_block}\n}}"

shacl_bridges.core.shacl

shacl_bridges.core.shacl

SHACL shape generation.

Produces a complete Turtle-serialized SHACL document containing: 1. A sh:NodeShape targeting the root class with nested sh:property constraints that mirror the source design pattern. 2. A sh:SPARQLRule embedding the SPARQL CONSTRUCT query from :mod:shacl_bridges.core.sparql.

The nested property validation ensures that pyshacl only fires the SPARQL rule against nodes that genuinely conform to the source pattern — preventing the rule from matching isolated instances that happen to share a class name.

generate_shacl(mapping, root_class, shape_name='shapes:BridgeShape')

Generate a complete SHACL Turtle document for the given mapping.

Parameters:

Name Type Description Default
mapping BridgeMapping

Loaded :class:~shacl_bridges.io.yaml_reader.BridgeMapping.

required
root_class str

CURIE of the class that the shape targets (sh:targetClass).

required
shape_name str

Local name for the generated sh:NodeShape.

'shapes:BridgeShape'

Returns:

Type Description
str

Full Turtle string ready to be written to a .ttl file.

Source code in shacl_bridges/core/shacl.py
def generate_shacl(
    mapping: BridgeMapping,
    root_class: str,
    shape_name: str = "shapes:BridgeShape",
) -> str:
    """Generate a complete SHACL Turtle document for the given mapping.

    Args:
        mapping: Loaded :class:`~shacl_bridges.io.yaml_reader.BridgeMapping`.
        root_class: CURIE of the class that the shape targets (``sh:targetClass``).
        shape_name: Local name for the generated ``sh:NodeShape``.

    Returns:
        Full Turtle string ready to be written to a ``.ttl`` file.
    """
    prefix_map = mapping.prefix_map()

    # ------------------------------------------------------------------
    # Turtle @prefix declarations
    # ------------------------------------------------------------------
    prefix_lines = [
        "@prefix sh:    <http://www.w3.org/ns/shacl#> .",
        "@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .",
        "@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .",
        "@prefix xsd:   <http://www.w3.org/2001/XMLSchema#> .",
        "@prefix shapes: <urn:shacl-bridges:shapes#> .",
    ]
    for pfx, ns in prefix_map.items():
        prefix_lines.append(f"@prefix {pfx}: <{ns}> .")
    prefix_block = "\n".join(prefix_lines)

    # ------------------------------------------------------------------
    # Nested sh:property validation (full source pattern, including peripheral)
    # ------------------------------------------------------------------
    G = build_validation_graph(mapping.source_pattern.triples)
    nested = _nested_properties(G, root_class)

    # ------------------------------------------------------------------
    # SPARQL CONSTRUCT query
    # ------------------------------------------------------------------
    construct_query = build_sparql_construct(
        mapping.class_alignment(),
        mapping.source_pattern.triples,
        mapping.target_pattern.triples,
        root_class,
        prefix_map,
        derived_entries=mapping.derived_class_map(),
    )
    # Indent the query body for embedding in Turtle triple-quote string
    indented_query = "\n".join(
        "        " + line if line.strip() else line
        for line in construct_query.splitlines()
    )

    sparql_rule = (
        "    sh:rule [\n"
        "        a sh:SPARQLRule ;\n"
        + _prefix_block(prefix_map, indent=2)
        + "        sh:message \"Bridge rule: transforms source design pattern to target.\" ;\n"
        f"        sh:construct \"\"\"\n{indented_query}\n        \"\"\" ;\n"
        "    ] ;\n"
    )

    # ------------------------------------------------------------------
    # Assemble
    # ------------------------------------------------------------------
    shape = (
        f"{shape_name}\n"
        "    a sh:NodeShape ;\n"
        f"    sh:targetClass {root_class} ;\n"
        + nested
        + sparql_rule
        + ".\n"
    )

    return f"{prefix_block}\n\n{shape}"

shacl_bridges.core.diff

shacl_bridges.core.diff

Graph validation and diff computation.

Runs pyshacl in two passes: 1. Base pass: validates the data graph against itself (no external shape), with RDFS inference enabled. This captures any triples that RDFS alone would add and establishes the baseline. 2. Bridge pass: runs the generated SHACL shape against the data graph. The SPARQL CONSTRUCT rule fires and adds new triples.

The diff between pass-1 and pass-2 (via rdflib's isomorphic graph diff) gives exactly the triples introduced by the bridge — nothing more.

BridgeResult dataclass

Outcome of running the bridge pipeline on a data graph.

Source code in shacl_bridges/core/diff.py
@dataclass
class BridgeResult:
    """Outcome of running the bridge pipeline on a data graph."""

    expanded_graph: Graph
    """The full data graph after SHACL rule application (base + bridged triples)."""

    diff_graph: Graph
    """Only the triples introduced by the bridge (expanded minus inferred base)."""

    conforms: bool
    """Whether the data graph conforms to the validation constraints."""

    report_text: str
    """Human-readable SHACL validation report."""

    report_graph: Graph
    """Machine-readable SHACL report graph."""

conforms instance-attribute

Whether the data graph conforms to the validation constraints.

diff_graph instance-attribute

Only the triples introduced by the bridge (expanded minus inferred base).

expanded_graph instance-attribute

The full data graph after SHACL rule application (base + bridged triples).

report_graph instance-attribute

Machine-readable SHACL report graph.

report_text instance-attribute

Human-readable SHACL validation report.

run_bridge(data_graph, shacl_graph, inference='rdfs')

Apply a SHACL bridge shape to data_graph and return the result.

Parameters:

Name Type Description Default
data_graph Graph

The instance data to transform.

required
shacl_graph Graph

The generated SHACL shape (containing the SPARQLRule).

required
inference str

Reasoner to apply before validation. "rdfs" is the default and sufficient for most harmonization needs. Pass "none" to disable inference entirely.

'rdfs'

Returns:

Name Type Description
A BridgeResult

class:BridgeResult with expanded graph, diff, and report.

Source code in shacl_bridges/core/diff.py
def run_bridge(
    data_graph: Graph,
    shacl_graph: Graph,
    inference: str = "rdfs",
) -> BridgeResult:
    """Apply a SHACL bridge shape to *data_graph* and return the result.

    Args:
        data_graph: The instance data to transform.
        shacl_graph: The generated SHACL shape (containing the SPARQLRule).
        inference: Reasoner to apply before validation. ``"rdfs"`` is the default
                   and sufficient for most harmonization needs.  Pass ``"none"``
                   to disable inference entirely.

    Returns:
        A :class:`BridgeResult` with expanded graph, diff, and report.
    """
    # Pass 1 — baseline with inference only, no external shape
    val_base = Validator(
        data_graph,
        options={"advanced": True, "inference": inference},
    )
    _, _, _ = val_base.run()
    inferred_base = val_base.target_graph

    # Pass 2 — full bridge shape
    val_bridge = Validator(
        data_graph,
        shacl_graph=shacl_graph,
        options={"advanced": True, "inference": inference},
    )
    conforms, report_g, report_text = val_bridge.run()
    expanded = val_bridge.target_graph

    # Diff
    iso_base = to_isomorphic(inferred_base)
    iso_expanded = to_isomorphic(expanded)
    _both, _only_base, only_expanded = graph_diff(iso_base, iso_expanded)

    return BridgeResult(
        expanded_graph=expanded,
        diff_graph=only_expanded,
        conforms=bool(conforms),
        report_text=report_text,
        report_graph=report_g,
    )

run_bridge_from_files(data_path, shacl_path, inference='rdfs')

Convenience wrapper: load graphs from file paths, then call :func:run_bridge.

Parameters:

Name Type Description Default
data_path str | Path

Path to the instance data Turtle file.

required
shacl_path str | Path

Path to the SHACL shape Turtle file.

required
inference str

Reasoner to apply.

'rdfs'

Returns:

Name Type Description
A BridgeResult

class:BridgeResult.

Source code in shacl_bridges/core/diff.py
def run_bridge_from_files(
    data_path: str | Path,
    shacl_path: str | Path,
    inference: str = "rdfs",
) -> BridgeResult:
    """Convenience wrapper: load graphs from file paths, then call :func:`run_bridge`.

    Args:
        data_path: Path to the instance data Turtle file.
        shacl_path: Path to the SHACL shape Turtle file.
        inference: Reasoner to apply.

    Returns:
        A :class:`BridgeResult`.
    """
    data_graph = Graph()
    data_graph.parse(str(data_path), format="turtle")

    shacl_graph = Graph()
    shacl_graph.parse(str(shacl_path), format="turtle")

    return run_bridge(data_graph, shacl_graph, inference=inference)

save_result(result, expanded_path, diff_path)

Serialize expanded and diff graphs to Turtle files.

Parameters:

Name Type Description Default
result BridgeResult

Output of :func:run_bridge.

required
expanded_path str | Path

Destination path for the expanded graph.

required
diff_path str | Path

Destination path for the diff graph.

required
Source code in shacl_bridges/core/diff.py
def save_result(
    result: BridgeResult,
    expanded_path: str | Path,
    diff_path: str | Path,
) -> None:
    """Serialize expanded and diff graphs to Turtle files.

    Args:
        result: Output of :func:`run_bridge`.
        expanded_path: Destination path for the expanded graph.
        diff_path: Destination path for the diff graph.
    """
    result.expanded_graph.serialize(destination=str(expanded_path), format="turtle")
    result.diff_graph.serialize(destination=str(diff_path), format="turtle")

shacl_bridges.validate

shacl_bridges.validate

Bridge mapping validator.

Runs structural and semantic checks on a loaded :class:BridgeMapping and returns a list of :class:ValidationIssue objects. An empty list means the mapping passed all checks.

CLI usage::

shacl-bridges validate my_bridge.yaml

Python usage::

from shacl_bridges.io.yaml_reader import load_mapping
from shacl_bridges.validate import validate_mapping, Severity

mapping = load_mapping("bridge.yaml")
issues = validate_mapping(mapping)
errors = [i for i in issues if i.severity == Severity.ERROR]

ValidationIssue dataclass

A single validation finding.

Source code in shacl_bridges/validate.py
@dataclass
class ValidationIssue:
    """A single validation finding."""

    severity: Severity
    message: str
    hint: str = field(default="")

    def __str__(self) -> str:
        icon = "✗" if self.severity == Severity.ERROR else "⚠"
        line = f"{icon}  {self.message}"
        if self.hint:
            line += f"\n   hint: {self.hint}"
        return line

validate_mapping(mapping)

Run all validation checks on mapping.

Checks performed:

  1. Prefix completeness — every CURIE used references a declared prefix (or a well-known built-in).
  2. Root existssource_pattern.root (if set) appears in at least one source triple.
  3. Source connectivity — every node in source_pattern is reachable from the chosen root. Disconnected nodes would cause silent over-matching.
  4. Class-map sources ⊆ source nodes — every class_map[].source appears in source_pattern.triples.
  5. Class-map targets ⊆ target nodes — every class_map[].target appears in target_pattern.triples.
  6. Target connectivity — the target pattern forms a connected graph. Disconnected target nodes produce isolated triples in the bridge output.

Parameters:

Name Type Description Default
mapping BridgeMapping

A loaded :class:~shacl_bridges.io.yaml_reader.BridgeMapping.

required

Returns:

Type Description
list[ValidationIssue]

List of :class:ValidationIssue objects. An empty list means the

list[ValidationIssue]

mapping passed all checks.

Source code in shacl_bridges/validate.py
def validate_mapping(mapping: BridgeMapping) -> list[ValidationIssue]:
    """Run all validation checks on *mapping*.

    Checks performed:

    1. **Prefix completeness** — every CURIE used references a declared prefix
       (or a well-known built-in).
    2. **Root exists** — ``source_pattern.root`` (if set) appears in at least
       one source triple.
    3. **Source connectivity** — every node in ``source_pattern`` is reachable
       from the chosen root. Disconnected nodes would cause silent over-matching.
    4. **Class-map sources ⊆ source nodes** — every ``class_map[].source`` appears
       in ``source_pattern.triples``.
    5. **Class-map targets ⊆ target nodes** — every ``class_map[].target`` appears
       in ``target_pattern.triples``.
    6. **Target connectivity** — the target pattern forms a connected graph.
       Disconnected target nodes produce isolated triples in the bridge output.

    Args:
        mapping: A loaded :class:`~shacl_bridges.io.yaml_reader.BridgeMapping`.

    Returns:
        List of :class:`ValidationIssue` objects. An empty list means the
        mapping passed all checks.
    """
    issues: list[ValidationIssue] = []

    # ------------------------------------------------------------------
    # 1. Prefix completeness
    # ------------------------------------------------------------------
    declared = set(mapping.prefixes.keys()) | _BUILTIN_PREFIXES
    seen_bad_prefixes: set[str] = set()
    for curie in _all_curies(mapping):
        if ":" in curie and not curie.startswith(("http", "urn", "_")):
            prefix = curie.split(":")[0]
            if prefix not in declared and prefix not in seen_bad_prefixes:
                seen_bad_prefixes.add(prefix)
                issues.append(ValidationIssue(
                    Severity.ERROR,
                    f"Prefix '{prefix}' (used in '{curie}') is not declared in prefixes",
                    f"Add '{prefix}: <namespace_IRI>' to the prefixes block",
                ))

    # ------------------------------------------------------------------
    # 2. Root exists in source_pattern
    # ------------------------------------------------------------------
    root = mapping.source_pattern.root
    src_nodes = _source_nodes(mapping)
    if root and root not in src_nodes:
        issues.append(ValidationIssue(
            Severity.ERROR,
            f"source_pattern.root '{root}' does not appear in any source_pattern triple",
            "Check for typos or add a triple that involves this class",
        ))

    # ------------------------------------------------------------------
    # 3. Source graph connectivity from root
    # ------------------------------------------------------------------
    if mapping.source_pattern.triples:
        try:
            effective_root = select_root_class(mapping.source_pattern.triples, root)
            # Skip connectivity check if root was already flagged as absent (check 2)
            if effective_root not in src_nodes:
                effective_root = select_root_class(mapping.source_pattern.triples, None)
            disconnected = check_connectivity(mapping.source_pattern.triples, effective_root)
            for node in disconnected:
                issues.append(ValidationIssue(
                    Severity.WARNING,
                    f"'{node}' is not reachable from root '{effective_root}' in source_pattern",
                    (
                        "This node will be omitted from the SPARQL WHERE clause, "
                        "causing silent over-matching. Set a different source_pattern.root "
                        "or connect this node to the rest of the pattern."
                    ),
                ))
        except ValueError as exc:
            issues.append(ValidationIssue(Severity.ERROR, str(exc)))

    # ------------------------------------------------------------------
    # 4. Class-map sources ⊆ source_pattern nodes
    # ------------------------------------------------------------------
    for entry in mapping.class_map:
        if entry.source not in src_nodes:
            issues.append(ValidationIssue(
                Severity.ERROR,
                f"class_map source '{entry.source}' does not appear in source_pattern.triples",
                (
                    "Add a source_pattern triple that involves this class, "
                    "or remove the class_map entry"
                ),
            ))

    # ------------------------------------------------------------------
    # 5. Class-map targets ⊆ target_pattern nodes
    # ------------------------------------------------------------------
    tgt_nodes = _target_nodes(mapping)
    for entry in mapping.class_map:
        if entry.target not in tgt_nodes:
            issues.append(ValidationIssue(
                Severity.ERROR,
                f"class_map target '{entry.target}' does not appear in target_pattern.triples",
                (
                    "Add a target_pattern triple that involves this class, "
                    "or remove the class_map entry"
                ),
            ))

    # ------------------------------------------------------------------
    # 6. Target pattern connectivity
    # ------------------------------------------------------------------
    if len(mapping.target_pattern.triples) > 1:
        G_tgt = nx.DiGraph()
        for s, _p, o in mapping.target_pattern.triples:
            G_tgt.add_edge(s, o)
        if not nx.is_weakly_connected(G_tgt):
            issues.append(ValidationIssue(
                Severity.WARNING,
                "target_pattern.triples do not form a connected graph",
                (
                    "Disconnected target nodes may produce isolated triples "
                    "in the bridge output that are hard to trace"
                ),
            ))

    return issues

shacl_bridges.visualize.mermaid

shacl_bridges.visualize.mermaid

Mermaid flowchart generation.

Produces a Mermaid flowchart TD diagram that shows:

  • Core source nodes (in class_map) — rectangle [Label]
  • Peripheral source nodes (validation-only, not in class_map) — stadium shape ([Label])
  • Target nodes — rounded rectangle (Label)
  • ShapeValidation subgraph:
    • CoreShapeInformation inner subgraph: core structural triples (thick ==> arrows)
    • Peripheral/upper-level triples outside the inner subgraph (thin ---> arrows)
  • TransformedGraph subgraph: target pattern triples (--> arrows)
  • Bridge connections: dotted -.....-> arrows from each source class to its target class

This diagram is generated automatically from the YAML mapping and stays in sync with the source/target patterns without manual maintenance.

generate_mermaid(mapping)

Generate a Mermaid flowchart diagram for mapping.

Parameters:

Name Type Description Default
mapping BridgeMapping

Loaded :class:~shacl_bridges.io.yaml_reader.BridgeMapping.

required

Returns:

Type Description
str

Mermaid diagram string (suitable for embedding in Markdown or saving

str

to a .mmd file).

Source code in shacl_bridges/visualize/mermaid.py
def generate_mermaid(mapping: BridgeMapping) -> str:
    """Generate a Mermaid flowchart diagram for *mapping*.

    Args:
        mapping: Loaded :class:`~shacl_bridges.io.yaml_reader.BridgeMapping`.

    Returns:
        Mermaid diagram string (suitable for embedding in Markdown or saving
        to a ``.mmd`` file).
    """
    source_triples = mapping.source_pattern.triples
    target_triples = mapping.target_pattern.triples
    class_alignment = mapping.class_alignment()

    # Core source classes = those present in the class_map (will be bridged)
    source_classes: set[str] = set(class_alignment.keys())

    # All nodes that appear anywhere in the source pattern
    all_source_nodes: set[str] = set()
    for s, _p, o in source_triples:
        all_source_nodes.add(s)
        all_source_nodes.add(o)

    # All nodes that appear anywhere in the target pattern
    all_target_nodes: set[str] = set()
    for s, _p, o in target_triples:
        all_target_nodes.add(s)
        all_target_nodes.add(o)

    # Peripheral = source nodes that are NOT bridged (validation-only)
    peripheral: set[str] = all_source_nodes - source_classes

    # Path lengths for dotted bridge arrow sizing
    try:
        G_src = build_validation_graph(source_triples)
        src_len = longest_path_length(G_src)
    except ValueError:
        src_len = 1
    try:
        G_tgt = build_validation_graph(target_triples)
        tgt_len = longest_path_length(G_tgt)
    except ValueError:
        tgt_len = 1

    dot_count = src_len + tgt_len + 3
    dotted = "-" + "." * dot_count + "->"

    lines: list[str] = ["flowchart TD"]

    # ------------------------------------------------------------------
    # Node declarations
    # ------------------------------------------------------------------
    for node in sorted(source_classes):
        lines.append(f"    {node}[{_local_name(node)}]")
    for node in sorted(peripheral):
        lines.append(f"    {node}([{_local_name(node)}])")
    for node in sorted(all_target_nodes):
        lines.append(f"    {node}({_local_name(node)})")
    lines.append("")

    # ------------------------------------------------------------------
    # ShapeValidation subgraph
    # ------------------------------------------------------------------
    lines.append("    subgraph ShapeValidation")
    lines.append("        subgraph CoreShapeInformation")

    extended: list[str] = []
    for s, p, o in source_triples:
        if s in source_classes and o in source_classes:
            lines.append(f"        {s} ==>|{_local_name(p)}| {o}")
        else:
            extended.append(f"    {s} --->|{_local_name(p)}| {o}")

    lines.append("        end")
    lines.extend(extended)
    lines.append("    end")
    lines.append("")

    # ------------------------------------------------------------------
    # TransformedGraph subgraph
    # ------------------------------------------------------------------
    lines.append("    subgraph TransformedGraph")
    for s, p, o in target_triples:
        lines.append(f"    {s} -->|{_local_name(p)}| {o}")
    lines.append("    end")
    lines.append("")

    # ------------------------------------------------------------------
    # Bridge connections (dotted arrows from source to target class)
    # ------------------------------------------------------------------
    for src, tgt in sorted(class_alignment.items()):
        lines.append(f"    {src} {dotted}|SHACL_bridge| {tgt}")

    return "\n".join(lines)

generate_mermaid_markdown(mapping)

Wrap the Mermaid diagram in a fenced code block for Markdown embedding.

Source code in shacl_bridges/visualize/mermaid.py
def generate_mermaid_markdown(mapping: BridgeMapping) -> str:
    """Wrap the Mermaid diagram in a fenced code block for Markdown embedding."""
    diagram = generate_mermaid(mapping)
    return f"```mermaid\n{diagram}\n```"