Skip to content

Validators API Reference

Validation classes and utilities.

XSD Validator

XSDValidator

Bases: BaseValidator

XSD schema validator for C-CDA documents.

Validates C-CDA documents against the official HL7 C-CDA XSD schemas.

Usage

Use default schemas (auto-downloads if needed)

validator = XSDValidator() result = validator.validate(document) if result.is_valid: ... print("Document is valid!") else: ... print("Validation errors:") ... for error in result.errors: ... print(f" - {error}")

Or provide custom schema path

validator = XSDValidator("/path/to/schemas/CDA.xsd") result = validator.validate(document)

Note

XSD schemas are automatically downloaded on first use if not present. Set auto_download=False to disable automatic downloads.

Attributes

schema_location property

Get the schema file location.

Functions

__init__(schema_path=None, auto_download=True)

Initialize XSD validator with schema file.

Parameters:

Name Type Description Default
schema_path Optional[Union[str, Path]]

Path to the CDA.xsd schema file. If None, uses default location and auto-downloads if needed.

None
auto_download bool

Automatically download schemas if missing. Default: True. Set to False to disable automatic downloads.

True

Raises:

Type Description
FileNotFoundError

If schema file doesn't exist and auto_download=False

XMLSchemaParseError

If schema is invalid

Note

On first use, XSD schemas (~2MB) will be automatically downloaded from HL7's official repository. This may take a few moments.

validate(document)

Validate a C-CDA document against XSD schema.

Parameters:

Name Type Description Default
document Union[_Element, str, bytes, Path]

Document to validate. Can be: - etree._Element: Parsed XML element - str: XML string or file path - bytes: XML bytes - Path: Path to XML file

required

Returns:

Type Description
ValidationResult

ValidationResult with errors from schema validation

Raises:

Type Description
FileNotFoundError

If file path doesn't exist

XMLSyntaxError

If document is not well-formed XML

validate_file(file_path)

Convenience method to validate a file.

Parameters:

Name Type Description Default
file_path Union[str, Path]

Path to XML file

required

Returns:

Type Description
ValidationResult

ValidationResult with errors from schema validation

Raises:

Type Description
FileNotFoundError

If file doesn't exist

validate_string(xml_string)

Convenience method to validate an XML string.

Parameters:

Name Type Description Default
xml_string str

XML document as string

required

Returns:

Type Description
ValidationResult

ValidationResult with errors from schema validation

validate_bytes(xml_bytes)

Convenience method to validate XML bytes.

Parameters:

Name Type Description Default
xml_bytes bytes

XML document as bytes

required

Returns:

Type Description
ValidationResult

ValidationResult with errors from schema validation

Schematron Validator

SchematronValidator

Bases: BaseValidator

Schematron validator for C-CDA documents.

Validates C-CDA documents using ISO Schematron rules for business logic, template conformance, and ONC certification requirements.

Usage

Use default HL7 C-CDA R2.1 Schematron (auto-cleaned version)

validator = SchematronValidator() result = validator.validate(document) if result.is_valid: ... print("Document passes Schematron validation!") else: ... print("Validation errors:") ... for error in result.errors: ... print(f" - {error}")

Use custom Schematron file

validator = SchematronValidator("/path/to/custom.sch") result = validator.validate(document)

Note

Schematron validation requires both the .sch file and voc.xml vocabulary file. These are automatically downloaded and cleaned on first use.

The official HL7 Schematron file contains IDREF errors that prevent lxml from loading it. This validator automatically uses a cleaned version that fixes these errors while preserving all validation rules.

Source code in ccdakit/validators/schematron.py
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
class SchematronValidator(BaseValidator):
    """
    Schematron validator for C-CDA documents.

    Validates C-CDA documents using ISO Schematron rules for business logic,
    template conformance, and ONC certification requirements.

    Usage:
        >>> # Use default HL7 C-CDA R2.1 Schematron (auto-cleaned version)
        >>> validator = SchematronValidator()
        >>> result = validator.validate(document)
        >>> if result.is_valid:
        ...     print("Document passes Schematron validation!")
        >>> else:
        ...     print("Validation errors:")
        ...     for error in result.errors:
        ...         print(f"  - {error}")

        >>> # Use custom Schematron file
        >>> validator = SchematronValidator("/path/to/custom.sch")
        >>> result = validator.validate(document)

    Note:
        Schematron validation requires both the .sch file and voc.xml vocabulary file.
        These are automatically downloaded and cleaned on first use.

        The official HL7 Schematron file contains IDREF errors that prevent lxml
        from loading it. This validator automatically uses a cleaned version that
        fixes these errors while preserving all validation rules.
    """

    # SVRL namespace for validation report
    SVRL_NS = "http://purl.oclc.org/dsdl/svrl"

    def __init__(
        self,
        schematron_path: Optional[Union[str, Path]] = None,
        phase: Optional[str] = None,
        auto_download: bool = True,
    ):
        """
        Initialize Schematron validator.

        Args:
            schematron_path: Path to Schematron file (.sch).
                If None, uses default HL7 C-CDA R2.1 Schematron.
            phase: Schematron phase to use (e.g., "errors", "warnings").
                If None, validates all phases.
            auto_download: Automatically download Schematron files if missing.
                Default: True. Set to False to disable automatic downloads.

        Raises:
            FileNotFoundError: If schematron file doesn't exist and auto_download=False
            etree.SchematronParseError: If schematron is invalid

        Note:
            On first use, Schematron files (~63MB) will be automatically downloaded
            from HL7's official GitHub repository. This may take a few moments.
        """
        self.schematron_path = self._resolve_schematron_path(schematron_path)
        self.phase = phase
        self.auto_download = auto_download

        # Attempt auto-download if file doesn't exist
        if not self.schematron_path.exists() and self.auto_download:
            self._attempt_auto_download()

        # Check if file exists after download attempt
        if not self.schematron_path.exists():
            raise FileNotFoundError(
                f"Schematron file not found: {self.schematron_path}\n"
                "Expected file: schemas/schematron/HL7_CCDA_R2.1.sch\n\n"
                "Options:\n"
                "1. Allow automatic download (default): SchematronValidator(auto_download=True)\n"
                "2. Download manually from: https://github.com/HL7/CDA-ccda-2.1\n"
                "3. Provide your own file: SchematronValidator(schematron_path='/path/to/file.sch')"
            )

        self.schematron = self._load_schematron()

    def _resolve_schematron_path(self, path: Optional[Union[str, Path]]) -> Path:
        """
        Resolve Schematron file path.

        Uses the cleaned version of HL7 C-CDA R2.1 Schematron by default, as the
        original file contains IDREF errors that prevent lxml from loading it.

        Args:
            path: User-provided path or None for default

        Returns:
            Resolved Path object
        """
        if path is not None:
            return Path(path)

        # Default to cleaned HL7 C-CDA R2.1 Schematron in package
        # The cleaned version has IDREF errors fixed for lxml compatibility
        current_dir = Path(__file__).parent
        package_root = current_dir.parent.parent

        # Check common locations for cleaned version first
        cleaned_locations = [
            package_root / "schemas" / "schematron" / "HL7_CCDA_R2.1_cleaned.sch",
            Path("schemas") / "schematron" / "HL7_CCDA_R2.1_cleaned.sch",
            Path.cwd() / "schemas" / "schematron" / "HL7_CCDA_R2.1_cleaned.sch",
        ]

        for location in cleaned_locations:
            if location.exists():
                return location

        # Fall back to original (for backwards compatibility)
        original_locations = [
            package_root / "schemas" / "schematron" / "HL7_CCDA_R2.1.sch",
            Path("schemas") / "schematron" / "HL7_CCDA_R2.1.sch",
            Path.cwd() / "schemas" / "schematron" / "HL7_CCDA_R2.1.sch",
        ]

        for location in original_locations:
            if location.exists():
                return location

        # Return default expected location (cleaned version, may not exist yet)
        return package_root / "schemas" / "schematron" / "HL7_CCDA_R2.1_cleaned.sch"

    def _attempt_auto_download(self) -> None:
        """
        Attempt to automatically download Schematron files.

        This method tries to download the official HL7 C-CDA R2.1 Schematron
        files if they're not present. Downloads are only attempted once.
        """
        try:
            print("Schematron files not found. Attempting automatic download...")
            print("This is a one-time download (~63MB). Please wait...")

            downloader = SchematronDownloader()
            success, message = downloader.download_all(force=False)

            if success:
                print(message)
                print("✓ Schematron files ready for validation!")
            else:
                warnings.warn(
                    f"Automatic download failed:\n{message}\n"
                    "You can provide your own Schematron file or download manually.",
                    UserWarning,
                    stacklevel=2,
                )

        except Exception as e:
            warnings.warn(
                f"Automatic download failed: {e}\n"
                "You can provide your own Schematron file using: "
                "SchematronValidator(schematron_path='/path/to/file.sch')",
                UserWarning,
                stacklevel=2,
            )

    def _load_schematron(self) -> isoschematron.Schematron:
        """
        Load and compile Schematron rules.

        Returns:
            Compiled Schematron object

        Raises:
            etree.SchematronParseError: If schematron is invalid
        """
        try:
            # Create custom resolver for voc.xml and other includes
            class SchematronResolver(etree.Resolver):
                def __init__(self, base_path: Path):
                    self.base_path = base_path.parent
                    super().__init__()

                def resolve(self, url, id, context):
                    # Handle voc.xml and other relative references
                    if url and not url.startswith(("http://", "https://", "file://")):
                        resolved_path = self.base_path / url
                        if resolved_path.exists():
                            return self.resolve_filename(str(resolved_path), context)
                    return None

            # Parse Schematron document with custom resolver
            parser = etree.XMLParser()
            parser.resolvers.add(SchematronResolver(self.schematron_path))

            # Parse with file path to set base URL
            schematron_doc = etree.parse(str(self.schematron_path), parser)

            # Create Schematron validator
            # store_schematron=True keeps the compiled schematron for inspection
            # store_report=True keeps validation reports for error extraction
            # For HL7 files, we need to be less strict about validation
            kwargs = {
                "store_schematron": True,
                "store_report": True,
                # Skip schema validation to be more permissive with HL7 files
                "validate_schema": False,
            }

            if self.phase is not None:
                kwargs["phase"] = self.phase

            return isoschematron.Schematron(schematron_doc, **kwargs)

        except etree.XMLSyntaxError as e:
            raise etree.SchematronParseError(
                f"Failed to parse Schematron file at {self.schematron_path}: {e}"
            ) from e
        except Exception as e:
            raise etree.SchematronParseError(
                f"Failed to load Schematron at {self.schematron_path}: {e}"
            ) from e

    def validate(self, document: Union[etree._Element, str, bytes, Path]) -> ValidationResult:
        """
        Validate a C-CDA document against Schematron rules.

        Args:
            document: Document to validate. Can be:
                - etree._Element: Parsed XML element
                - str: XML string or file path
                - bytes: XML bytes
                - Path: Path to XML file

        Returns:
            ValidationResult with Schematron validation findings

        Raises:
            FileNotFoundError: If file path doesn't exist
            etree.XMLSyntaxError: If document is not well-formed XML
        """
        result = ValidationResult()

        try:
            # Parse document
            doc_element = self._parse_document(document)

            # Run Schematron validation
            is_valid = self.schematron.validate(doc_element)

            if not is_valid:
                # Extract validation messages from SVRL report
                report = self.schematron.validation_report
                issues = self._extract_issues_from_report(report)

                # Categorize issues by level (schematron reports as failed-assert or successful-report)
                for issue in issues:
                    if issue.level == ValidationLevel.ERROR:
                        result.errors.append(issue)
                    elif issue.level == ValidationLevel.WARNING:
                        result.warnings.append(issue)
                    else:
                        result.infos.append(issue)

        except etree.XMLSyntaxError as e:
            result.errors.append(
                ValidationIssue(
                    level=ValidationLevel.ERROR,
                    message=f"XML syntax error: {e}",
                    location=f"Line {e.lineno}" if hasattr(e, "lineno") else None,
                    code="XML_SYNTAX_ERROR",
                )
            )
        except FileNotFoundError as e:
            result.errors.append(
                ValidationIssue(
                    level=ValidationLevel.ERROR,
                    message=str(e),
                    code="FILE_NOT_FOUND",
                )
            )
        except Exception as e:
            result.errors.append(
                ValidationIssue(
                    level=ValidationLevel.ERROR,
                    message=f"Schematron validation error: {e}",
                    code="SCHEMATRON_ERROR",
                )
            )

        return result

    def _extract_issues_from_report(self, report: etree._Element) -> List[ValidationIssue]:
        """
        Extract validation issues from SVRL report.

        Args:
            report: SVRL validation report element

        Returns:
            List of ValidationIssue objects
        """
        issues = []

        # Extract failed assertions (errors)
        for element in report.findall(f".//{{{self.SVRL_NS}}}failed-assert"):
            issue = self._parse_failed_assert(element)
            if issue:
                issues.append(issue)

        # Extract successful reports (warnings/info)
        for element in report.findall(f".//{{{self.SVRL_NS}}}successful-report"):
            issue = self._parse_successful_report(element)
            if issue:
                issues.append(issue)

        return issues

    def _parse_failed_assert(self, element: etree._Element) -> Optional[ValidationIssue]:
        """
        Parse failed-assert element from SVRL report.

        Args:
            element: failed-assert element

        Returns:
            ValidationIssue or None
        """
        # Extract message text
        text_elem = element.find(f"{{{self.SVRL_NS}}}text")
        if text_elem is None:
            return None

        message = self._extract_text_content(text_elem)
        if not message:
            return None

        # Extract location (XPath where assertion failed)
        location = element.get("location")

        # Extract rule ID (CONF ID or template ID)
        rule_id = element.get("id")

        # Build error code from rule ID
        code = f"SCHEMATRON_{rule_id}" if rule_id else "SCHEMATRON_ERROR"

        # Format full error message for parser
        full_message = f"ERROR at {location}: {message}" if location else f"ERROR: {message}"

        # Parse error for enhanced display
        parsed_error = SchematronErrorParser.parse_error(full_message)

        return ValidationIssue(
            level=ValidationLevel.ERROR,
            message=message,
            location=location,
            code=code,
            parsed_data=parsed_error.to_dict(),
        )

    def _parse_successful_report(self, element: etree._Element) -> Optional[ValidationIssue]:
        """
        Parse successful-report element from SVRL report.

        Successful reports are typically warnings or informational messages.

        Args:
            element: successful-report element

        Returns:
            ValidationIssue or None
        """
        # Extract message text
        text_elem = element.find(f"{{{self.SVRL_NS}}}text")
        if text_elem is None:
            return None

        message = self._extract_text_content(text_elem)
        if not message:
            return None

        # Extract location
        location = element.get("location")

        # Extract rule ID
        rule_id = element.get("id")

        # Determine level based on rule ID or message content
        # C-CDA Schematron typically uses role="warning" or role="info"
        role = element.get("role", "").lower()
        if "warning" in role or "warn" in message.lower():
            level = ValidationLevel.WARNING
        else:
            level = ValidationLevel.INFO

        code = f"SCHEMATRON_{rule_id}" if rule_id else "SCHEMATRON_INFO"

        # Format full message for parser
        severity_label = "WARNING" if level == ValidationLevel.WARNING else "INFO"
        full_message = (
            f"{severity_label} at {location}: {message}"
            if location
            else f"{severity_label}: {message}"
        )

        # Parse for enhanced display
        parsed_error = SchematronErrorParser.parse_error(full_message)

        return ValidationIssue(
            level=level,
            message=message,
            location=location,
            code=code,
            parsed_data=parsed_error.to_dict(),
        )

    def _extract_text_content(self, element: etree._Element) -> str:
        """
        Extract text content from element, handling nested elements.

        Args:
            element: Element containing text

        Returns:
            Concatenated text content
        """
        # Get all text including from nested elements
        text_parts = []

        # Get element's direct text
        if element.text:
            text_parts.append(element.text.strip())

        # Get text from all descendants
        for child in element:
            if child.text:
                text_parts.append(child.text.strip())
            if child.tail:
                text_parts.append(child.tail.strip())

        # Join and clean up
        full_text = " ".join(text_parts)
        # Remove extra whitespace
        return " ".join(full_text.split())

    def validate_file(self, file_path: Union[str, Path]) -> ValidationResult:
        """
        Convenience method to validate a file.

        Args:
            file_path: Path to XML file

        Returns:
            ValidationResult with Schematron validation findings

        Raises:
            FileNotFoundError: If file doesn't exist
        """
        return self.validate(Path(file_path))

    def validate_string(self, xml_string: str) -> ValidationResult:
        """
        Convenience method to validate an XML string.

        Args:
            xml_string: XML document as string

        Returns:
            ValidationResult with Schematron validation findings
        """
        return self.validate(xml_string)

    def validate_bytes(self, xml_bytes: bytes) -> ValidationResult:
        """
        Convenience method to validate XML bytes.

        Args:
            xml_bytes: XML document as bytes

        Returns:
            ValidationResult with Schematron validation findings
        """
        return self.validate(xml_bytes)

    @property
    def schematron_location(self) -> Path:
        """Get the Schematron file location."""
        return self.schematron_path

    @property
    def validation_phase(self) -> Optional[str]:
        """Get the validation phase being used."""
        return self.phase

Attributes

schematron_location property

Get the Schematron file location.

validation_phase property

Get the validation phase being used.

Functions

__init__(schematron_path=None, phase=None, auto_download=True)

Initialize Schematron validator.

Parameters:

Name Type Description Default
schematron_path Optional[Union[str, Path]]

Path to Schematron file (.sch). If None, uses default HL7 C-CDA R2.1 Schematron.

None
phase Optional[str]

Schematron phase to use (e.g., "errors", "warnings"). If None, validates all phases.

None
auto_download bool

Automatically download Schematron files if missing. Default: True. Set to False to disable automatic downloads.

True

Raises:

Type Description
FileNotFoundError

If schematron file doesn't exist and auto_download=False

SchematronParseError

If schematron is invalid

Note

On first use, Schematron files (~63MB) will be automatically downloaded from HL7's official GitHub repository. This may take a few moments.

Source code in ccdakit/validators/schematron.py
def __init__(
    self,
    schematron_path: Optional[Union[str, Path]] = None,
    phase: Optional[str] = None,
    auto_download: bool = True,
):
    """
    Initialize Schematron validator.

    Args:
        schematron_path: Path to Schematron file (.sch).
            If None, uses default HL7 C-CDA R2.1 Schematron.
        phase: Schematron phase to use (e.g., "errors", "warnings").
            If None, validates all phases.
        auto_download: Automatically download Schematron files if missing.
            Default: True. Set to False to disable automatic downloads.

    Raises:
        FileNotFoundError: If schematron file doesn't exist and auto_download=False
        etree.SchematronParseError: If schematron is invalid

    Note:
        On first use, Schematron files (~63MB) will be automatically downloaded
        from HL7's official GitHub repository. This may take a few moments.
    """
    self.schematron_path = self._resolve_schematron_path(schematron_path)
    self.phase = phase
    self.auto_download = auto_download

    # Attempt auto-download if file doesn't exist
    if not self.schematron_path.exists() and self.auto_download:
        self._attempt_auto_download()

    # Check if file exists after download attempt
    if not self.schematron_path.exists():
        raise FileNotFoundError(
            f"Schematron file not found: {self.schematron_path}\n"
            "Expected file: schemas/schematron/HL7_CCDA_R2.1.sch\n\n"
            "Options:\n"
            "1. Allow automatic download (default): SchematronValidator(auto_download=True)\n"
            "2. Download manually from: https://github.com/HL7/CDA-ccda-2.1\n"
            "3. Provide your own file: SchematronValidator(schematron_path='/path/to/file.sch')"
        )

    self.schematron = self._load_schematron()

validate(document)

Validate a C-CDA document against Schematron rules.

Parameters:

Name Type Description Default
document Union[_Element, str, bytes, Path]

Document to validate. Can be: - etree._Element: Parsed XML element - str: XML string or file path - bytes: XML bytes - Path: Path to XML file

required

Returns:

Type Description
ValidationResult

ValidationResult with Schematron validation findings

Raises:

Type Description
FileNotFoundError

If file path doesn't exist

XMLSyntaxError

If document is not well-formed XML

Source code in ccdakit/validators/schematron.py
def validate(self, document: Union[etree._Element, str, bytes, Path]) -> ValidationResult:
    """
    Validate a C-CDA document against Schematron rules.

    Args:
        document: Document to validate. Can be:
            - etree._Element: Parsed XML element
            - str: XML string or file path
            - bytes: XML bytes
            - Path: Path to XML file

    Returns:
        ValidationResult with Schematron validation findings

    Raises:
        FileNotFoundError: If file path doesn't exist
        etree.XMLSyntaxError: If document is not well-formed XML
    """
    result = ValidationResult()

    try:
        # Parse document
        doc_element = self._parse_document(document)

        # Run Schematron validation
        is_valid = self.schematron.validate(doc_element)

        if not is_valid:
            # Extract validation messages from SVRL report
            report = self.schematron.validation_report
            issues = self._extract_issues_from_report(report)

            # Categorize issues by level (schematron reports as failed-assert or successful-report)
            for issue in issues:
                if issue.level == ValidationLevel.ERROR:
                    result.errors.append(issue)
                elif issue.level == ValidationLevel.WARNING:
                    result.warnings.append(issue)
                else:
                    result.infos.append(issue)

    except etree.XMLSyntaxError as e:
        result.errors.append(
            ValidationIssue(
                level=ValidationLevel.ERROR,
                message=f"XML syntax error: {e}",
                location=f"Line {e.lineno}" if hasattr(e, "lineno") else None,
                code="XML_SYNTAX_ERROR",
            )
        )
    except FileNotFoundError as e:
        result.errors.append(
            ValidationIssue(
                level=ValidationLevel.ERROR,
                message=str(e),
                code="FILE_NOT_FOUND",
            )
        )
    except Exception as e:
        result.errors.append(
            ValidationIssue(
                level=ValidationLevel.ERROR,
                message=f"Schematron validation error: {e}",
                code="SCHEMATRON_ERROR",
            )
        )

    return result

validate_file(file_path)

Convenience method to validate a file.

Parameters:

Name Type Description Default
file_path Union[str, Path]

Path to XML file

required

Returns:

Type Description
ValidationResult

ValidationResult with Schematron validation findings

Raises:

Type Description
FileNotFoundError

If file doesn't exist

Source code in ccdakit/validators/schematron.py
def validate_file(self, file_path: Union[str, Path]) -> ValidationResult:
    """
    Convenience method to validate a file.

    Args:
        file_path: Path to XML file

    Returns:
        ValidationResult with Schematron validation findings

    Raises:
        FileNotFoundError: If file doesn't exist
    """
    return self.validate(Path(file_path))

validate_string(xml_string)

Convenience method to validate an XML string.

Parameters:

Name Type Description Default
xml_string str

XML document as string

required

Returns:

Type Description
ValidationResult

ValidationResult with Schematron validation findings

Source code in ccdakit/validators/schematron.py
def validate_string(self, xml_string: str) -> ValidationResult:
    """
    Convenience method to validate an XML string.

    Args:
        xml_string: XML document as string

    Returns:
        ValidationResult with Schematron validation findings
    """
    return self.validate(xml_string)

validate_bytes(xml_bytes)

Convenience method to validate XML bytes.

Parameters:

Name Type Description Default
xml_bytes bytes

XML document as bytes

required

Returns:

Type Description
ValidationResult

ValidationResult with Schematron validation findings

Source code in ccdakit/validators/schematron.py
def validate_bytes(self, xml_bytes: bytes) -> ValidationResult:
    """
    Convenience method to validate XML bytes.

    Args:
        xml_bytes: XML document as bytes

    Returns:
        ValidationResult with Schematron validation findings
    """
    return self.validate(xml_bytes)

Base Validator

BaseValidator

Bases: ABC

Abstract base class for C-CDA validators.

All validators should inherit from this class and implement the validate method.

Source code in ccdakit/validators/base.py
class BaseValidator(ABC):
    """
    Abstract base class for C-CDA validators.

    All validators should inherit from this class and implement
    the validate method.
    """

    @abstractmethod
    def validate(self, document: Union[etree._Element, str, bytes, Path]) -> ValidationResult:
        """
        Validate a C-CDA document.

        Args:
            document: Document to validate. Can be:
                - etree._Element: Parsed XML element
                - str: XML string or file path
                - bytes: XML bytes
                - Path: Path to XML file

        Returns:
            ValidationResult with errors, warnings, and info messages

        Raises:
            FileNotFoundError: If file path doesn't exist
            etree.XMLSyntaxError: If document is not well-formed XML
        """
        pass

    def _parse_document(self, document: Union[etree._Element, str, bytes, Path]) -> etree._Element:
        """
        Parse document into an lxml Element.

        Args:
            document: Document in various formats

        Returns:
            Parsed XML element

        Raises:
            FileNotFoundError: If file path doesn't exist
            etree.XMLSyntaxError: If document is not well-formed XML
        """
        if isinstance(document, etree._Element):
            return document

        if isinstance(document, Path):
            if not document.exists():
                raise FileNotFoundError(f"File not found: {document}")
            return etree.parse(str(document)).getroot()

        if isinstance(document, str):
            # Check if it looks like XML (starts with < or whitespace then <)
            stripped = document.lstrip()
            if stripped.startswith("<"):
                # Parse as XML string
                return etree.fromstring(document.encode("utf-8"))
            # Otherwise try as file path
            path = Path(document)
            if path.exists():
                return etree.parse(str(path)).getroot()
            # If not found, try parsing as XML anyway (might be malformed)
            return etree.fromstring(document.encode("utf-8"))

        if isinstance(document, bytes):
            return etree.fromstring(document)

        raise TypeError(
            f"Unsupported document type: {type(document)}. "
            "Expected etree._Element, str, bytes, or Path"
        )

Functions

validate(document) abstractmethod

Validate a C-CDA document.

Parameters:

Name Type Description Default
document Union[_Element, str, bytes, Path]

Document to validate. Can be: - etree._Element: Parsed XML element - str: XML string or file path - bytes: XML bytes - Path: Path to XML file

required

Returns:

Type Description
ValidationResult

ValidationResult with errors, warnings, and info messages

Raises:

Type Description
FileNotFoundError

If file path doesn't exist

XMLSyntaxError

If document is not well-formed XML

Source code in ccdakit/validators/base.py
@abstractmethod
def validate(self, document: Union[etree._Element, str, bytes, Path]) -> ValidationResult:
    """
    Validate a C-CDA document.

    Args:
        document: Document to validate. Can be:
            - etree._Element: Parsed XML element
            - str: XML string or file path
            - bytes: XML bytes
            - Path: Path to XML file

    Returns:
        ValidationResult with errors, warnings, and info messages

    Raises:
        FileNotFoundError: If file path doesn't exist
        etree.XMLSyntaxError: If document is not well-formed XML
    """
    pass

Validation Rules

ValidationRule

Bases: ABC

Base class for custom validation rules.

Example

class MyCustomRule(ValidationRule): def init(self): super().init( name="my_custom_rule", description="Validates custom business logic" )

def validate(self, document: etree._Element) -> List[ValidationIssue]:
    issues = []
    # Implement validation logic
    return issues
Source code in ccdakit/validators/rules.py
class ValidationRule(ABC):
    """
    Base class for custom validation rules.

    Example:
        class MyCustomRule(ValidationRule):
            def __init__(self):
                super().__init__(
                    name="my_custom_rule",
                    description="Validates custom business logic"
                )

            def validate(self, document: etree._Element) -> List[ValidationIssue]:
                issues = []
                # Implement validation logic
                return issues
    """

    def __init__(self, name: str, description: str):
        """
        Initialize validation rule.

        Args:
            name: Unique identifier for the rule
            description: Human-readable description of what the rule checks
        """
        self.name = name
        self.description = description

    @abstractmethod
    def validate(self, document: etree._Element) -> List[ValidationIssue]:
        """
        Apply rule to document.

        Args:
            document: Parsed C-CDA XML document element

        Returns:
            List of validation issues found (empty list if valid)
        """
        raise NotImplementedError(f"Rule '{self.name}' must implement validate()")

    def __repr__(self) -> str:
        """String representation of rule."""
        return f"<ValidationRule: {self.name}>"

Functions

__init__(name, description)

Initialize validation rule.

Parameters:

Name Type Description Default
name str

Unique identifier for the rule

required
description str

Human-readable description of what the rule checks

required
Source code in ccdakit/validators/rules.py
def __init__(self, name: str, description: str):
    """
    Initialize validation rule.

    Args:
        name: Unique identifier for the rule
        description: Human-readable description of what the rule checks
    """
    self.name = name
    self.description = description

validate(document) abstractmethod

Apply rule to document.

Parameters:

Name Type Description Default
document _Element

Parsed C-CDA XML document element

required

Returns:

Type Description
List[ValidationIssue]

List of validation issues found (empty list if valid)

Source code in ccdakit/validators/rules.py
@abstractmethod
def validate(self, document: etree._Element) -> List[ValidationIssue]:
    """
    Apply rule to document.

    Args:
        document: Parsed C-CDA XML document element

    Returns:
        List of validation issues found (empty list if valid)
    """
    raise NotImplementedError(f"Rule '{self.name}' must implement validate()")

__repr__()

String representation of rule.

Source code in ccdakit/validators/rules.py
def __repr__(self) -> str:
    """String representation of rule."""
    return f"<ValidationRule: {self.name}>"

Validation rule classes are available in: - ccdakit.validators.rules - Base rules and composites - ccdakit.validators.common_rules - Common reusable rules

See the Validation Guide for usage examples.

Schema Manager

SchemaManager

Manager for C-CDA XSD schemas.

Helps with schema discovery, downloading, and path management.

Source code in ccdakit/validators/utils.py
class SchemaManager:
    """
    Manager for C-CDA XSD schemas.

    Helps with schema discovery, downloading, and path management.
    """

    def __init__(self, schema_dir: Optional[Path] = None):
        """
        Initialize schema manager.

        Args:
            schema_dir: Directory containing schemas. Defaults to project's schemas/ directory.
        """
        self.schema_dir = schema_dir or DEFAULT_SCHEMA_DIR
        self.schema_dir.mkdir(parents=True, exist_ok=True)

    def is_installed(self) -> bool:
        """
        Check if C-CDA schemas are installed.

        Returns:
            True if CDA.xsd exists in schema directory
        """
        return self.get_cda_schema_path().exists()

    def get_cda_schema_path(self) -> Path:
        """
        Get path to main CDA.xsd schema file.

        Returns:
            Path to CDA.xsd (may not exist)
        """
        return self.schema_dir / "CDA.xsd"

    def get_schema_info(self) -> dict:
        """
        Get information about installed schemas.

        Returns:
            Dictionary with schema installation status and paths
        """
        cda_path = self.get_cda_schema_path()
        return {
            "installed": cda_path.exists(),
            "schema_dir": str(self.schema_dir),
            "cda_schema": str(cda_path),
            "cda_exists": cda_path.exists(),
            "files": [f.name for f in self.schema_dir.iterdir() if f.is_file()],
        }

    def download_schemas(
        self,
        version: str = "R2.1",
        url: Optional[str] = None,
        force: bool = False,
    ) -> Tuple[bool, str]:
        """
        Download C-CDA schemas from HL7.

        Note: This is a helper function, but schemas may need to be
        downloaded manually from HL7's website due to licensing.

        Args:
            version: C-CDA version (R2.1 or R2.0)
            url: Custom download URL (overrides version)
            force: Force re-download even if schemas exist

        Returns:
            Tuple of (success: bool, message: str)

        Raises:
            ValueError: If version is not supported
        """
        if self.is_installed() and not force:
            return (
                True,
                f"Schemas already installed at {self.schema_dir}. Use force=True to re-download.",
            )

        if url is None:
            if version not in SCHEMA_URLS:
                raise ValueError(
                    f"Unsupported version: {version}. "
                    f"Supported versions: {list(SCHEMA_URLS.keys())}"
                )
            url = SCHEMA_URLS[version]

        try:
            # Download zip file
            zip_path = self.schema_dir / "schemas.zip"
            urlretrieve(url, zip_path)

            # Extract schemas
            with zipfile.ZipFile(zip_path, "r") as zip_ref:
                zip_ref.extractall(self.schema_dir)

            # Clean up zip file
            zip_path.unlink()

            return True, f"Schemas downloaded successfully to {self.schema_dir}"

        except Exception as e:
            return False, f"Failed to download schemas: {e}"

    def print_installation_instructions(self) -> None:
        """Print instructions for manually downloading schemas."""
        instructions = f"""
C-CDA XSD Schema Installation Instructions
==========================================

The C-CDA XSD schemas must be downloaded from HL7 due to licensing restrictions.

Method 1: Download from HL7 (Recommended)
------------------------------------------
1. Visit the HL7 C-CDA download page:
   - R2.1: https://www.hl7.org/implement/standards/product_brief.cfm?product_id=492
   - R2.0: https://www.hl7.org/implement/standards/product_brief.cfm?product_id=379

2. Download the schema package (e.g., "CCDA_R2.1_Schemas.zip")

3. Extract the following files to: {self.schema_dir}
   - CDA.xsd (main schema file)
   - POCD_MT000040_CCDA.xsd
   - datatypes.xsd
   - voc.xsd
   - NarrativeBlock.xsd
   - SDTC/ directory (if available)

Method 2: Use Schema Manager (Automated)
-----------------------------------------
>>> from ccdakit.validators.utils import SchemaManager
>>> manager = SchemaManager()
>>> success, message = manager.download_schemas(version="R2.1")
>>> print(message)

Note: Automated download may not work due to HL7's licensing requirements.
      Manual download is recommended.

Verification
------------
After installation, verify schemas are available:

>>> manager = SchemaManager()
>>> info = manager.get_schema_info()
>>> print(info)

The 'cda_exists' field should be True.
"""
        print(instructions)

Functions

__init__(schema_dir=None)

Initialize schema manager.

Parameters:

Name Type Description Default
schema_dir Optional[Path]

Directory containing schemas. Defaults to project's schemas/ directory.

None
Source code in ccdakit/validators/utils.py
def __init__(self, schema_dir: Optional[Path] = None):
    """
    Initialize schema manager.

    Args:
        schema_dir: Directory containing schemas. Defaults to project's schemas/ directory.
    """
    self.schema_dir = schema_dir or DEFAULT_SCHEMA_DIR
    self.schema_dir.mkdir(parents=True, exist_ok=True)

is_installed()

Check if C-CDA schemas are installed.

Returns:

Type Description
bool

True if CDA.xsd exists in schema directory

Source code in ccdakit/validators/utils.py
def is_installed(self) -> bool:
    """
    Check if C-CDA schemas are installed.

    Returns:
        True if CDA.xsd exists in schema directory
    """
    return self.get_cda_schema_path().exists()

get_cda_schema_path()

Get path to main CDA.xsd schema file.

Returns:

Type Description
Path

Path to CDA.xsd (may not exist)

Source code in ccdakit/validators/utils.py
def get_cda_schema_path(self) -> Path:
    """
    Get path to main CDA.xsd schema file.

    Returns:
        Path to CDA.xsd (may not exist)
    """
    return self.schema_dir / "CDA.xsd"

get_schema_info()

Get information about installed schemas.

Returns:

Type Description
dict

Dictionary with schema installation status and paths

Source code in ccdakit/validators/utils.py
def get_schema_info(self) -> dict:
    """
    Get information about installed schemas.

    Returns:
        Dictionary with schema installation status and paths
    """
    cda_path = self.get_cda_schema_path()
    return {
        "installed": cda_path.exists(),
        "schema_dir": str(self.schema_dir),
        "cda_schema": str(cda_path),
        "cda_exists": cda_path.exists(),
        "files": [f.name for f in self.schema_dir.iterdir() if f.is_file()],
    }

download_schemas(version='R2.1', url=None, force=False)

Download C-CDA schemas from HL7.

Note: This is a helper function, but schemas may need to be downloaded manually from HL7's website due to licensing.

Parameters:

Name Type Description Default
version str

C-CDA version (R2.1 or R2.0)

'R2.1'
url Optional[str]

Custom download URL (overrides version)

None
force bool

Force re-download even if schemas exist

False

Returns:

Type Description
Tuple[bool, str]

Tuple of (success: bool, message: str)

Raises:

Type Description
ValueError

If version is not supported

Source code in ccdakit/validators/utils.py
def download_schemas(
    self,
    version: str = "R2.1",
    url: Optional[str] = None,
    force: bool = False,
) -> Tuple[bool, str]:
    """
    Download C-CDA schemas from HL7.

    Note: This is a helper function, but schemas may need to be
    downloaded manually from HL7's website due to licensing.

    Args:
        version: C-CDA version (R2.1 or R2.0)
        url: Custom download URL (overrides version)
        force: Force re-download even if schemas exist

    Returns:
        Tuple of (success: bool, message: str)

    Raises:
        ValueError: If version is not supported
    """
    if self.is_installed() and not force:
        return (
            True,
            f"Schemas already installed at {self.schema_dir}. Use force=True to re-download.",
        )

    if url is None:
        if version not in SCHEMA_URLS:
            raise ValueError(
                f"Unsupported version: {version}. "
                f"Supported versions: {list(SCHEMA_URLS.keys())}"
            )
        url = SCHEMA_URLS[version]

    try:
        # Download zip file
        zip_path = self.schema_dir / "schemas.zip"
        urlretrieve(url, zip_path)

        # Extract schemas
        with zipfile.ZipFile(zip_path, "r") as zip_ref:
            zip_ref.extractall(self.schema_dir)

        # Clean up zip file
        zip_path.unlink()

        return True, f"Schemas downloaded successfully to {self.schema_dir}"

    except Exception as e:
        return False, f"Failed to download schemas: {e}"

print_installation_instructions()

Print instructions for manually downloading schemas.

Source code in ccdakit/validators/utils.py
    def print_installation_instructions(self) -> None:
        """Print instructions for manually downloading schemas."""
        instructions = f"""
C-CDA XSD Schema Installation Instructions
==========================================

The C-CDA XSD schemas must be downloaded from HL7 due to licensing restrictions.

Method 1: Download from HL7 (Recommended)
------------------------------------------
1. Visit the HL7 C-CDA download page:
   - R2.1: https://www.hl7.org/implement/standards/product_brief.cfm?product_id=492
   - R2.0: https://www.hl7.org/implement/standards/product_brief.cfm?product_id=379

2. Download the schema package (e.g., "CCDA_R2.1_Schemas.zip")

3. Extract the following files to: {self.schema_dir}
   - CDA.xsd (main schema file)
   - POCD_MT000040_CCDA.xsd
   - datatypes.xsd
   - voc.xsd
   - NarrativeBlock.xsd
   - SDTC/ directory (if available)

Method 2: Use Schema Manager (Automated)
-----------------------------------------
>>> from ccdakit.validators.utils import SchemaManager
>>> manager = SchemaManager()
>>> success, message = manager.download_schemas(version="R2.1")
>>> print(message)

Note: Automated download may not work due to HL7's licensing requirements.
      Manual download is recommended.

Verification
------------
After installation, verify schemas are available:

>>> manager = SchemaManager()
>>> info = manager.get_schema_info()
>>> print(info)

The 'cda_exists' field should be True.
"""
        print(instructions)