Chapter 4

Methodology

4.1 Introduction: From Theory Contract to Method Contract

Chapter 4: Methodology

4.1 Introduction: From Theory Contract to Method Contract (this section)
4.2 Research Paradigm and the Inheritance from Theory
4.3 Design Cycles, the Data Pipeline, and the Census Basis of Evidence
4.4 Evaluation Questions and Measures
4.5 Adjudication, Thresholds, and the P5 Composite Criterion
4.6 Validity and Reliability
4.7 Ethical Considerations and the Handoff Contract

Theoretical specification without procedural commitment is methodologically inactive. A demanding inheritance burden can fix what an artefact suite must achieve and under what warrant, yet leave wholly open how its claims are produced, how the supporting evidence is assessed, and how the resulting verdicts are adjudicated. That gap is where a design theory either becomes testable or remains a conceptually attractive promise; a proposition that names no procedure for refuting it cannot be said to make an empirical demand at all. Chapter 3 closed precisely at this edge. Five propositions (Proposition 1 to Proposition 5), five meta-requirements (Meta-Requirement 1 to Meta-Requirement 5), and a Gregor and Jones design-theory anatomy established the warrant for the suite, but stopped short of any executable procedure for putting that warrant under load. The work here resolves the gap by specifying the operational bridge between theory and execution, so that each inherited proposition arrives at the artefact chapters already equipped with the measure, baseline, and adjudication rule that would let it fail.

The remit is bounded to that bridge and no further. It states how claims will be tested before Chapters 5 to 9 execute those tests, and how contribution boundaries are preserved for Chapters 10 and 11. The orientation is protocol-facing rather than rhetorical, and the distinction matters: a method contract earns its authority by being specified in advance of the evidence, not by being narrated after it. It does not review design-science literature, which is Chapter 3’s justificatory knowledge; it does not construct artefacts, the task of Chapters 5 to 9; it does not report evaluation results, reserved for Chapter 10; and it does not interpret contributions, the work of Chapter 11. Everything downstream conforms to the contract fixed here. The remainder of this section accepts the inheritance from theory, names the argument spine that organises the commitments, grounds the contract in the problem domain through six environment requirements, and previews the cumulative sequence of the chapter.

Inheritance acceptance. Six methodological elements must be defined before any artefact chapter can proceed.¹ Chapter 3 Section 3.6 imposes each as a binding inheritance obligation, and each is accepted here in turn.

The unit of change. Design features (DF-05A-01 to DF-08D-02) constitute the unit of change. Each specifies a targeted artefact capability that satisfies one or more environment requirements through a declared mechanism object, so that every intervention can be evaluated individually rather than as part of an undifferentiated whole.
The invariant format. Evaluation measures (EM-4W-01 to EM-09-04) supply the invariant format in which criteria are expressed. Each carries a metric definition, a baseline, a unit, a threshold, and its validity risks, so that no measure is admitted without a stated way of failing.
The trigger-to-check mapping. Four design-evaluate cycles (Cycle 1 to Cycle 4) fix which design changes trigger which evaluation checks, binding artefact construction to its corresponding evaluation logic rather than leaving the linkage to be inferred after the fact.
The baseline. Comparator conditions are declared for each evaluation question and grounded in current-practice workflows, so that every proposition is judged against a defensible standard rather than an arbitrary reference point.
The evidence objects. Five evidence classes (EVID-P1-INTERPRETABILITY through EVID-P5-BURDEN) name the specific data artefacts that may support or refute each claim, each accompanied by threat annotations and uncertainty boundaries.
The adjudication rules. A four-result scheme (pass, fail, indeterminate, and boundary-limited acceptance) classifies the evidence, with explicit criteria and a dedicated composite rule for Proposition 5.

These six are binding commitments, not editorial preferences. Taken together they convert the inheritance from theory into a closed specification: a discrete unit that can be altered, a fixed format in which success is expressed, a declared mapping from change to check, a defensible standard of comparison, a named set of evidence-bearing artefacts, and a rule for reading those artefacts as verdicts. No subsequent chapter can rescue a missing methodological specification with an attractive artefact or a persuasive narrative; the obligation is discharged here or it remains open.

Argument spine. Four spine codes organise the commitments from problem to contribution. SP-01 (Problem) formalises the environment burden through six environment requirements (ER-01 to ER-06), each a governance obligation derived from the housing-adaptation problem established in Chapters 1 and 2. SP-02 (Artefact) instantiates the mechanism objects (interface, invariant, transformation, and trigger) through design features distributed across the artefact chapters. SP-03 (Evaluation) declares the evaluation logic through five evaluation questions (EQ-01 to EQ-05) with linked measures, pre-registered baselines, thresholds, and adjudication rules. SP-04 (Contribution) extracts reusable prescriptive outputs through contribution rules (TR-01 to TR-04) with explicit scope and falsifier discipline. The spine codes recur throughout the chapter as the connective tissue that keeps each procedural choice traceable to its purpose.

Environment requirements overview. Six environment requirements ground the method contract in the problem domain. Each is a stakeholder-facing governance obligation derived from Chapters 1 and 2, not invented by the method chapter, and each is introduced here at overview level; operationalisation is distributed across the evaluation-linkage and ethical-stakeholder-mapping sections.

ER-01: reduce life-course burden for households and residents through semantically stable representations.
ER-02: reduce tacit interpretation load for practitioners and assessors through explicit interfaces.
ER-03: provide credible utility evidence for project owners and providers through matched comparisons.
ER-04: ensure relation-level interpretability for SDA participants and assessors.
ER-05: reduce specialist dependence for design and review teams through transparent trigger-to-check logic.
ER-06: prevent exception-driven governance drift for maintenance and governance actors.

Each requirement traces to a specific source in the earlier problem analysis. The derivation trace below maps each requirement to that source so that examiners can locate the originating evidence in Chapter 1 and Chapter 2:

ER	Governance Obligation	Source Evidence (Chapter 1 / Chapter 2)
ER-01	Reduce life-course burden	Chapter 2 Section 2.1: diachronic housing mismatch as dynamic person-environment fit, with residential inertia developed at Section 2.2; Chapter 1 Section 1.2: adaptation cost as barrier
ER-02	Reduce tacit interpretation load	Chapter 2 Section 2.10: representational-governance problem class; synchronous artefact interpretation variance
ER-03	Provide credible utility evidence	Chapter 2 Section 2.11: meta-requirement MR5 (evaluation-readiness); verification burden evidence
ER-04	Ensure relation-level interpretability	Chapter 2 Section 2.10: representational-governance problem class; interpretability across handover and round-trip continuity
ER-05	Reduce specialist dependence	Chapter 2 Section 2.5: interface discipline gap; ungoverned complement coupling
ER-06	Prevent exception-driven governance drift	Chapter 2 Section 2.10: representational-governance problem class; governance drift under ungoverned representational form and unauditable variation

These requirements are formalised here as the testable obligations that the artefact suite must address and the evaluation programme must assess. They convert a diffuse problem narrative into a finite set of governance targets against which procedural success can later be judged, and each names the stakeholder who bears the burden when the obligation goes unmet: households and residents under ER-01, practitioners and assessors under ER-02 and ER-04, owners and providers under ER-03, design and review teams under ER-05, and maintenance and governance actors under ER-06. Anchoring the contract to those bearers, rather than to the convenience of measurement, is what keeps the evaluation programme answerable to the problem domain it claims to serve.

Section roadmap. The chapter builds cumulatively across seven sections, each adding a layer to the contract before the artefact chapters draw on it; later sections presuppose the constructs that earlier sections define, so the order is load-bearing rather than presentational. Section 4.2 (Research Paradigm and the Inheritance from Theory) justifies design science as the governing paradigm, positions the contribution type and theory-maturity level, and binds Chapter 3’s locked commitments to their operationalisation surfaces. Section 4.3 (Design Cycles, the Data Pipeline, and the Census Basis of Evidence) structures artefact construction through the four cycles, traces provenance through the layered pipeline, and explains why the corpus is treated as a census rather than a sample. Section 4.4 (Evaluation Questions and Measures) declares the five evaluation questions with their linked measures and pre-registered baselines. Section 4.5 (Adjudication, Thresholds, and the P5 Composite Criterion) fixes the four-result scheme and operationalises the composite rule for integrated burden. Section 4.6 (Validity and Reliability) registers threats across the technical, purpose, and instrument dimensions and defines reliability as repeated-execution consistency. Section 4.7 (Ethical Considerations and the Handoff Contract) maps stakeholder groups to ethical obligations and specifies what Chapters 5 to 11 are bound to execute, evaluate, and report.

4.2 Research Paradigm and the Inheritance from Theory

Chapter 4: Methodology

4.1 Introduction: From Theory Contract to Method Contract
4.2 Research Paradigm and the Inheritance from Theory (this section)
4.3 Design Cycles, the Data Pipeline, and the Census Basis of Evidence
4.4 Evaluation Questions and Measures
4.5 Adjudication, Thresholds, and the P5 Composite Criterion
4.6 Validity and Reliability
4.7 Ethical Considerations and the Handoff Contract

Design Science Research is the governing paradigm here because the thesis object is a designed artefact suite whose evaluation demands construction-and-test logic, not observational explanation or interpretive reconstruction. The research problem set out in Chapters 1 and 2 is interventionist: representational governance failures (semantic drift across handovers, ungoverned interface coupling, non-replayable transformations, and unauditable variation) are resolved by building and evaluating new artefacts, not by observing existing practice.² Three reasons ground the choice. First, the research object is a designed artefact suite whose properties are specified before construction and judged against pre-declared criteria, satisfying Hevner et al.’s Guidelines 1 through 3 (design as artefact, problem relevance, design evaluation).³ Second, Action Research was considered and rejected: it requires a single organisational setting in which the researcher intervenes in ongoing practice, whereas this system is constructed from regulatory standards text and evaluated through demonstration cases rather than situated intervention.⁴ Third, interpretive case study was rejected because the artefact does not yet exist as a phenomenon to observe; the demonstration cases are constructed test contexts governed by the evaluation protocol, not naturally occurring settings from which explanatory theory is induced.⁵ The selection rests on problem-artefact fit.

The commitments adopted here inherit from three works that pre-date the Hevner and Peffers consolidation, and naming them protects the methodology against the critique that DSR is a recent and consequently under-tested paradigm. Walls, Widmeyer and El Sawy formalised the information system design theory (kernel theories, meta-requirements, meta-design, design method, and testable hypotheses), fixing the form a design contribution must take to count as theoretical; this is the structural ancestor of the Gregor and Jones anatomy adopted in Chapter 3 Section 3.5, against which the propositions Proposition 1 through Proposition 5 are framed.⁶ Nunamaker, Chen and Purdin established the systems-development research methodology (concept, system architecture, analysis and design, then build-observe-evaluate), the precedent under which a DSR programme whose object is a working software artefact gains methodological standing.⁷ Kuechler and Vaishnavi analysed the anatomy of a DSR project, warranting the treatment of Proposition 1 through Proposition 5 as kernel-theory hypotheses that the design cycles can refine; on this reading the Cycle 4 exception-budget mechanism is a kernel-theory refinement, not methodological drift.⁸

The thesis follows the six-stage Peffers et al. DSRM, with each stage mapped to chapters so the process stays auditable. Problem identification and motivation occupy Chapters 1 and 2, where the environment requirements ER-01 to ER-06 formalise the problem. Definition of objectives spans Chapters 2 and 3, converting the meta-requirements Meta-Requirement 1 to Meta-Requirement 5 into the propositions Proposition 1 to Proposition 5. Design and development are distributed across the artefact chapters (Chapters 5 to 9), each instantiating design features under explicit mechanism objects and governed by the method contract declared in this chapter. Demonstration and evaluation occur in Chapter 10, which executes the pre-registered evaluation questions EQ-01 to EQ-05 with their declared measures. Communication spans Chapters 11 and 12, which extract contribution rules TR-01 to TR-04 under explicit scope and falsifier discipline. The process is iterative: refinement is documented within each artefact chapter’s design cycles rather than as undocumented inter-chapter revision. Hevner’s three-cycle view supplies a complementary lens: the relevance cycle connects to the problem environment (the ERs), the rigor cycle to the knowledge base (wicked-problem, complex-adaptive-system, modularity, platform, and Gregor and Jones theory), and the design cycle to artefact construction and evaluation.⁹

fig-ch4-04-dsr-hevner-alignment — **Thesis Alignment with Hevner’s Three-Cycle DSR Framework** Thesis positioning within Hevner et al.’s (2004) three-cycle design science research framework. The relevance cycle connects the housing-adaptation problem context to the environment requirements; the rigor cycle grounds the design in wicked-problem theory, modularity theory, and platform architecture; and the design cycle iterates between artefact construction (Chapters 5 to 9) and evaluation (Chapter 10), with bidirectional feedback to both.

Two positioning claims are kept on distinct axes and not conflated. In Gregor and Hevner’s contribution-type taxonomy the work is primarily exaptive: modularity theory, relational-graph topology, and text-as-notational-medium are each established in their home domains, and the integrated design theory for this housing-governance application class is the contribution we develop.¹⁰ A secondary improvement claim also holds, since the work offers a new representational solution for the known problem of diachronic housing adaptation. At the theory-type level the work is a nascent design theory, Level 2 in Gregor and Hevner’s maturity framework, specifying constructs, design principles, and form-function commitments that can be criticised, operationalised, and tested; progression to a well-developed theory (Level 3, corresponding to Gregor’s Type V) is contingent on the artefact-and-evaluation programme producing positive evidence.¹¹ The exaptive characterisation describes the kind of knowledge contribution; the nascent-to-well-developed trajectory describes how far along the evidential spectrum it has progressed. The artefact suite is classified at the thesis level as a composite of four March-and-Smith artefact types: the standardisation schema is a model with secondary method properties, the Governed Kernel Architecture a model with secondary construct properties, the notation a method with secondary construct properties, and the generator an instantiation with secondary method properties.¹² Each per-artefact classification is declared and justified in its chapter. These artefacts are the evidence machinery through which the single architectural claim is discharged, not four independent contributions.

Philosophical stance

The philosophical stance is pragmatist. The artefact suite is judged by its utility in improving representational governance (reducing interpretation variance, bounding verification scope, enabling transformation replay, and governing variation under shared rules) rather than by correspondence to an independently existing representational reality. This orientation is consistent with DSR’s design-oriented epistemology, in which knowledge is produced by creating artefacts that demonstrate, or fail to demonstrate, desired properties under specified conditions.

Pragmatism does not entail relativism about evidence. The evaluation protocol declared in Section 4.5 pre-commits to baselines, thresholds, and falsifiers that bind regardless of how attractive the design may appear, so the artefact can fail the evaluation honestly. A brief articulation of the stance is itself warranted: empirical analyses of DSR doctoral work report that a substantial share of theses leave the ontological or epistemological basis for judging artefact merit unstated, and this section closes that gap.

DSR’s own limitations are acknowledged rather than assumed away. Iivari identifies a transfer-boundary tension: design principles may not transfer without adaptation, and the methodology offers no systematic way to predict where transfer fails; here the artefact is designed for Australian SDA-governed housing, so transfer to other regulatory or cultural contexts is a boundary condition, not a guarantee.¹³ Venable raises the related concern that constructed-scenario evaluation provides weaker support for prescriptive claims than naturalistic field evidence.¹⁴ The study answers this by classifying its evaluation as artificial-summative under the FEDS Technical-Risk-and-Efficacy strategy: the declared evaluation ceiling is efficacy under constructed demonstration, not effectiveness under naturalistic field deployment, with naturalistic evaluation identified as future work. Contribution claims are bounded accordingly. The boundary conditions follow from this posture: the theory applies where adaptation recurs, where representational handover materially affects decisions, and where verification burden is a real constraint, while universal policy optimisation, one-shot bespoke settings, and contexts in which handover quality is non-critical lie out of scope.

Action Design Research acknowledgement

This study is not executed as a full Action Design Research programme. ADR requires embedded organisational intervention with concurrent theory development and practice shaping, whereas the present work constructs artefacts from regulatory text and evaluates them through demonstration cases rather than organisational deployment.¹⁵

The study nonetheless retains three ADR-informed disciplines for auditability. Intervention records document each design cycle’s change and its rationale; reflection records document what each evaluation learned and what was adjusted; and iteration records document each artefact chapter’s design-cycle history, including negative outcomes and abandoned alternatives. These records support audit, not claims about organisational practice.

All five of Chapter 3’s locked commitments are operationalised through the method contract, each with a declared primary and secondary operationalisation surface plus an adjudication anchor. The mapping is set out in the table below, relocated from the chapter introduction so the binding text sits alongside the paradigm that warrants it; primary operationalisation is where a commitment’s testable form is declared, secondary operationalisation where its evidential or methodological consequences are bound. The adjudication anchors cite the evidence-mechanism identifiers (the EM-4W-* and EM-09-* codes) and the evidence-object identifiers (the EVID-* codes) declared in Section 4.4. No commitment is operationalised solely at the rhetorical level.

Chapter 3 Section 3.6 Locked Commitment	Primary Operationalisation (this chapter)	Secondary Operationalisation (this chapter)	Adjudication Anchor
(i) Revision-governed adaptation	Section 4.3 cycle structure (Cycle 1 to Cycle 4): each cycle’s intent-change-evaluation-decision loop treats adaptation as iterative revision rather than terminal optimisation	Section 4.4 EQ-05 (integrated burden) with the Section 4.5 P5 composite criterion: burden is assessed across repeated change episodes rather than at a single static state	Four-result classification (pass / fail / indeterminate / boundary-limited acceptance) applied to repeated cycle outcomes
(ii) Modularity as interface-governed coupling control	Section 4.4 design-feature layer: `DF-05A-01` to `DF-08D-02` declare interface and trigger objects as the unit of change	Section 4.4 EQ-02 (verification scope) with the Section 4.6 technical-validity layer: bounded-edit verification is the test of interface-governed coupling	EM-4W-02 + EM-09-02 with `EVID-P2-CHECK-SCOPE`
(iii) Platform architecture (stable core + governed instance library)	Section 4.4 EQ-04 (governed variation) and the Cycle 4 specification: variant generation under shared rules with explicit exception budgeting	Section 4.7 ER-06 stakeholder mapping: governance actors absorb the consequence of exception drift if platform discipline fails	EM-4W-03 + EM-09-04 with `EVID-P4-VARIATION` and `EVID-P4-EXCEPTION-BUDGET`
(iv) Design theory as Gregor and Jones contract	Section 4.2 paradigm justification with Section 4.5 adjudication rules: establishes the design-theory machinery and operationalises the falsifier discipline declared in Chapter 3 Section 3.5	Section 4.6 validity layers: instrument and purpose validity controls protect the proposition register from instrument bias and case-mismatch threats	Four-result scheme with the P5 composite criterion
(v) Trigram-relational-graph-text substrate	Section 4.4 EQ-01 (interpretive divergence): primary test of P1 where the trigram is the atomic governance unit	Section 4.3 corpus provenance and pipeline architecture, with Section 4.4 EQ-02 and EQ-04: the topology and serialisation medium are exercised in interface and variation tests	EM-4W-01 + EM-09-01 with `EVID-P1-INTERPRETABILITY` and `EVID-P1-SDA-TRACE`

4.3 Design Cycles, the Data Pipeline, and the Census Basis of Evidence

Chapter 4: Methodology

4.1 Introduction: From Theory Contract to Method Contract
4.2 Research Paradigm and the Inheritance from Theory
4.3 Design Cycles, the Data Pipeline, and the Census Basis of Evidence (this section)
4.4 Evaluation Questions and Measures
4.5 Adjudication, Thresholds, and the P5 Composite Criterion
4.6 Validity and Reliability
4.7 Ethical Considerations and the Handoff Contract

The paradigm commitments set out in Section 4.2 become an auditable iteration structure of four design-evaluate cycles operating over a provenance-controlled data substrate. Hevner’s three-cycle view structures the design cycle as the iterative process through which artefact construction and evaluation proceed, taking inputs from the relevance cycle (problem requirements) and the rigor cycle (theoretical foundations) and returning tested evidence to both.¹⁶ Each cycle here follows a four-step structure: intent (what the cycle aims to achieve and why), change (what artefact construction or modification is undertaken), evaluation (what measures and evidence objects assess the change), and decision (whether the outcome supports continuation, adaptation, or escalation). The structure forces every artefact chapter to leave an auditable record of what was tried, observed, and concluded, including negative or boundary-limited outcomes. It operationalises Peffers et al.’s feedback loop between evaluation and design as a repeatable cycle rather than a single pass.¹⁷ We define four cycles, one per artefact, because each artefact discharges a distinct governance function with its own design-change object. A fifth cycle for Proposition 5 (integrated utility) was considered and rejected: Proposition 5’s evidence comes from the interaction of the Cycle 1 to Cycle 4 outputs exercised together in Chapter 10, not from a fifth construction target, and a cycle without a construction target would violate the requirement that each cycle produce a testable artefact change.

Four cycle specifications

The cycle table binds construction to evaluation. It records each cycle’s intent, change, evaluation, and decision logic together with the measures (EM-*) and evidence objects (EVID-*) that tie the cycle to the evaluation protocol in Section 4.4.

Cycle	Intent	Change	Evaluation	Decision	Linked Measures	Linked Evidence
Cycle 1	Stabilise standards semantics for cross-actor interpretability	Apply serialisation constraints in the standardisation schema (Chapter 5)	Compare interpretation divergence to baseline	Keep or adapt schema rules	`EM-4W-01`, `EM-09-01`	`EVID-P1-INTERPRETABILITY`, `EVID-P1-SDA-TRACE`
Cycle 2	Bound verification scope for local edits	Publish interface and trigger contracts in the Governed Kernel Architecture (Chapter 6)	Compare local-to-global check-scope ratios	Keep or escalate interface boundaries	`EM-4W-02`, `EM-09-02`	`EVID-P2-CHECK-SCOPE`, `EVID-P2-LOCAL-GLOBAL`
Cycle 3	Ensure transformation replay reliability	Formalise deterministic transformation logic in the notation (Chapter 7)	Run replay and invariant checks	Keep or refine transformation grammar	`EM-4W-02`, `EM-09-02`	`EVID-P3-REPLAY`, `EVID-P3-INVARIANTS`
Cycle 4	Preserve governance under variation	Apply rule-compliant generation with exception budgeting in the generator (Chapter 9)	Assess compliant variation and exception quality	Keep or refine exception policy	`EM-4W-03`, `EM-09-04`	`EVID-P4-VARIATION`, `EVID-P4-EXCEPTION-BUDGET`

Each cycle addresses one proposition. Cycle 1 tests whether the serialisation schema developed in Chapter 5 materially reduces interpretive divergence relative to the unserialised regulatory text (Proposition 1). Cycle 2 tests whether the declared interfaces and trigger contracts of the modular library in Chapter 6 convert broad reconstruction into local checking plus rule-governed escalation (Proposition 2). Cycle 3 tests whether the notation grammar of Chapter 7 yields deterministic replay with invariants retained across governed transformations (Proposition 3). Cycle 4 tests whether the procedural generation pipeline of Chapter 9 produces legitimate local variation without losing comparability, compatibility, or traceability (Proposition 4).

Decision-criterion specification

The four decision verbs are pre-registered with explicit triggering criteria so that artefact chapters cannot adjudicate cycle outcomes by retrospective narrative. A cycle outcome is classified by what its measures show, calibrated to the four-result scheme in Section 4.5. Keep applies when all linked measures meet their pre-registered thresholds and no falsifier from the Chapter 3 proposition-contract matrix fires: the configuration carries forward unchanged. Adapt applies when at least one measure is boundary-limited and the shortfall localises to a specific design-feature attribute rather than the mechanism object: the affected attribute is modified, not the mechanism, and the boundary is declared. Escalate applies when at least one measure fails with no admissible boundary explanation and the failure implicates the mechanism object, or when a Chapter 3 falsifier fires: the cycle halts and re-enters the design space at the mechanism level. Refine applies when at least one measure is indeterminate, the evidence being insufficient to classify pass or fail (for example, an inadequately specified matched baseline or inconclusive instrument output): the instrument or baseline is sharpened before a verdict is assigned. The criteria are exhaustive at the per-measure level, and the cycle’s overall decision is the strongest verb any measure triggers: escalate dominates adapt, which dominates refine, which dominates keep. The ordering reflects the design-economy principle that mechanism-level failures cannot be discharged by attribute-level repairs and that indeterminate measures must be resolved before they are relied upon.

Inter-cycle dependencies and integrity rules

The cycles form a dependency chain rather than a parallel set. Cycle 1 produces semantically stabilised representations; without stable anchors, Cycle 2 cannot define meaningful module boundaries, because the concepts its interfaces govern would be referentially unstable. Cycle 2 produces declared interface contracts; without these, Cycle 3 cannot formalise a grammar that must operate over declared interfaces rather than unstructured text. Cycle 3 produces a formal notation with deterministic parse and replay; without executable logic, Cycle 4 cannot implement rule-compliant generation. This mirrors the theory-to-artefact derivation in Chapter 3 Section 3.5, in which Proposition 1 enables Proposition 2, which enables Proposition 3, which enables Proposition 4; the ordering is a logical consequence, not arbitrary scheduling. Two integrity rules govern all cycles. First, negative and boundary-limited outcomes must be explicitly recorded (a cycle reporting only positive results is incomplete), which operationalises DSR Principle P6.¹⁸ Second, each cycle must reference at least one evaluation measure and one evidence object, so that cycles cannot become narrative accounts of construction without verifiable evaluation. Chapters 6 to 9 are bound to the same discipline: a minimum of three documented iterations each, with at least one negative or boundary-limited outcome.

dsr_cycle_architecture — **DSR Cycle Architecture** The four design-evaluate cycles (Cycle 1 to Cycle 4) and their inter-dependencies. Each cycle targets one design proposition (Proposition 1 to Proposition 4) and produces one artefact (Chapters 5 to 7, 9). Sequential dependencies flow from semantic stabilisation (Cycle 1) through interface bounding (Cycle 2) and transformation reliability (Cycle 3) to governed variation (Cycle 4); Proposition 5 spans all four cycles as cross-cycle integration.

Design cycle evidence and the Proposition 5 cross-cycle position

The cycle structure has already been exercised in the standardisation-schema programme. Its design-cycle register documents nine explicit iterations (0 through 7b) across four analytical cycles, comprising 39 total experiments that span corpus-integrity baselining, polysemy measurement, schema construction, reproducibility stress testing, adversarial remediation, and integrated assessment (see Chapter 5 Section 5.8). Each iteration records intent, change, evaluation, and decision in the four-step format, including negative outcomes (eight duplicate clause identifiers retained as a design constraint rather than suppressed) and boundary-limited acceptances such as holdout generalisation that demonstrates coverage stability while acknowledging domain-specificity limits. The register is concrete evidence that the discipline is executable rather than aspirational. Proposition 5 (integrated utility under diachronic burden) occupies a cross-cycle position: it is informed by each cycle’s evidence but adjudicated only at the summative level in Chapter 10, where the outputs of Cycles 1 to 4 are exercised together on matched dwelling cases. Its operationalisation (the materiality threshold, composite rule, and learning-effect protocol) is specified in Section 4.5.

The data pipeline

Data collection and transformation are organised as provenance-safe pipelines so that every claim traces from requirement origin through design-feature instantiation to evaluation output. Provenance is a methodological necessity for DSR rigour rather than an administrative convenience: Hevner et al.’s Guideline 5 requires a transparent and repeatable construction and evaluation process,¹⁹ and provenance control prevents two specific failures: orphan claims (assertions not linked to a declared requirement) and ghost requirements (requirements no design feature addresses). The architecture has four control layers, each transforming the prior layer’s outputs.

Requirement intake layer. The six environment requirements (ER-01 to ER-06), derived from the problem analysis in Chapters 1 and 2, become coded requirement cards with explicit stakeholder anchors, governance functions, and downstream traceability links. The requirements are testable obligations with declared measures and falsifiers, not aspirational goals.

Design feature layer. The design features (DF-05A-01 to DF-08D-02) are artefact capabilities instantiated across the construction chapters, each satisfying one or more requirements through a declared mechanism object. A no-orphan integrity rule binds every requirement to at least one feature and every feature to at least one requirement; the RDE traceability matrix enforces this across all nine rows.²⁰

Measure layer. The evaluation measures (EM-4W-01 to EM-09-04), defined in Section 4.4, carry pre-registered comparator logic with no post hoc threshold adjustment. They are declared before construction and remain fixed unless a cycle explicitly revises them with documented justification.

Evidence layer. The evidence objects (EVID-P1-INTERPRETABILITY through EVID-P5-BURDEN) are the data artefacts that support or refute proposition claims. Each carries an uncertainty annotation declaring what it can and cannot support, which guards against over-interpretation of partial results.

The data-object transformation table specifies, for each class, its source path, transformation, consumer, and risk control.

Data Object	Source Path	Transformation Step	Consumer	Risk Control
Requirement cards	Problem analysis dossier and register modules (Chapters 1 and 2)	Requirement coding and normalisation	Section 4.4; Chapter 10 evaluation	Coding rules and audit trail
Design feature catalogue	RDE traceability matrix	Feature-to-requirement linking with mechanism-object classification	Chapters 5 to 9 artefact chapters	No-orphan linkage checks
Measure specifications	Chapter 4 evaluation workbench	Threshold and baseline registration with pre-declared comparator logic	Section 4.4; Chapter 10 evaluation	Pre-registered comparator logic; no post hoc adjustment
Evidence objects	Case outputs and result logs (Chapters 5 to 9)	Aggregation and adjudication against declared measures	Chapter 10 evaluation and contribution extraction	Uncertainty annotation; declared evidence-boundary statements

A binding repeatability principle governs every step: each transformation must be reproducible under declared settings, and a step that cannot be reproduced cannot be cited as evidence. The principle requires the same analytical conclusions from the same inputs and settings, not identical numerical output across all environments. Chapter 5 Section 5.14 demonstrates this through the reproducibility stress test of the standardisation-schema serialisation pipeline. This module owns the pipeline architecture; the section is titled to reflect that the thesis involves genuine computational pipelines (serialisation, parsing, corpus transformation, and rule-compliant generation) rather than conventional data-gathering procedures.

data_pipeline_architecture — **Four-Layer Data Pipeline Architecture** The four-layer data pipeline governing provenance across the thesis. The requirement-intake layer (ER-01 to ER-06) feeds the design-feature layer (DF codes), which feeds the measure layer (EM codes), which feeds the evidence layer (EVID objects). Three control principles (no-orphan integrity, repeatability, and provenance traceability) govern all inter-layer transformations.

Corpus provenance and the census basis of evidence

The primary data source for Cycle 1 is the NDIS Specialist Disability Accommodation Design Standard, Version 1 (October 2019), comprising 611 clause records across 25 design elements plus 189 design requirements extracted from 19 normative figures.²¹ Its extraction method, integrity controls, and scope declarations are owned by Chapter 5 Section 5.3, preserving the anti-overlap boundary between the method chapter and the artefact chapter. Two provenance commitments apply. The corpus is treated as a closed regulatory text, so generalisability to other standards instruments is a declared boundary condition rather than an assumption. The corpus integrity audit (zero missing keys, zero truncation markers, eight flagged duplicate identifiers retained as design constraints) establishes a baseline quality level that downstream transformations must maintain or document departures from.

The second primary data source is the empirical substrate of Chapter 8, which grounds the evaluation cycles cross-cuttingly rather than constituting a numbered design cycle of its own (the four cycles map to Chapters 5, 6, 7, and 9; Chapter 8 supplies the substrate they are exercised against): a corpus of 745 post-validation single-component Australian residential floor plans, assembled in two construction strata of 572 plans in the October stratum and 173 in the August stratum. The size was chosen on two grounds, both design rationale rather than statistical representativeness: analytical adequacy, the corpus being large enough to expose distributional regularity and stabilise the coupling structure while small enough to preserve per-plan agent-manual rigour; and coverage context, since 745 plans enumerate roughly one in ten thousand of Australia’s national separate-house stock (the 2021 Census records 10,852,208 private dwellings, of which 70.1 per cent are separate houses, approximately 7.6 million) offered as an indication that the corpus is non-trivial in scale and explicitly not as a basis for statistical generalisation to that stock.²² The codification through which the corpus was extracted (the space-category taxonomy, the extraction ruleset, and the coupling-classification method) was itself worked out through staged piloting in the design-science manner: a superseded application-programming-interface extraction iteration that was trialled and pivoted to agent-manual multimodal-vision extraction, a first-batch validation pilot of roughly the first fifty plans that calibrated the ruleset, and then the full corpus processed in monitored waves. It is extracted under the agent-manual canonical protocol whose extraction method, staged development record, and dimensional, occurrence, topological, and configurational analyses are owned by Chapter 8 Section 8.10, again preserving the anti-overlap boundary. The corpus is a complete census, not a probability sample, of the screened stock for the strata it enumerates: every plan that passes the categorisation gate is included, and no sampling frame mediates between the strata and the records analysed. This status is methodologically load-bearing, and its consequence is binding for the whole thesis. Inferential statistics exist to quantify sampling variability: the uncertainty about an unknown population parameter estimated from a sample. Over a census there is no sampling variability to quantify, so a confidence interval, significance test, or resampling distribution computed on it has no referent. Accordingly, no inferential statistic is reported anywhere in this thesis: no p-value, confidence interval, significance test, chi-square, analysis of variance, significance-bearing regression or correlation, and no bootstrap, permutation, or jackknife resampling. This restraint is the more rigorous choice, not a concession: a read-only survey of eight comparable Design Science Research theses in the built-environment, mass-customisation, and BIM literatures found that none reports an author-computed inferential statistic, and all use descriptive evidence.²³

Three drafting commitments follow and bind the quantitative reporting in Chapters 5, 6, 8, and 10.

Descriptive by default. Quantitative evidence is reported as counts, proportions, frequencies, coverage, agreement rates, and runtimes, each with explicit denominators.

Thresholds as transparent design decisions. Every numeric threshold is stated as a design decision, with its sensitivity shown by plain counts at the adjacent cut-offs rather than by a resampling distribution.

Validation reserved for functional demonstration. The word “validation” denotes functional demonstration, not statistical confirmation, consistent with the artificial-efficacy evaluation posture set out in Section 4.5.

Where a coupling or adjacency pattern is reported over the corpus, it is reported as a realised-frequency class; absence of a pattern at the census’s co-occurrence floor is reported as such and is not asserted as a prohibition, since a census establishes what the stock realises rather than what it forbids. So constrained, the corpus supplies descriptive ecological grounding and population-level coverage for the artefact’s primitives, and a reference distribution against which the demonstration trajectory of Chapter 10 is positioned descriptively, never a sample from which the artefact’s properties are statistically generalised. Section 4.4 specifies the evaluation strategy that consumes these data objects and applies the pre-registered questions, baselines, thresholds, and adjudication rules.

4.4 Evaluation Questions and Measures

Chapter 4: Methodology

4.1 Introduction: From Theory Contract to Method Contract
4.2 Research Paradigm and the Inheritance from Theory
4.3 Design Cycles, the Data Pipeline, and the Census Basis of Evidence
4.4 Evaluation Questions and Measures (this section)
4.5 Adjudication, Thresholds, and the P5 Composite Criterion
4.6 Validity and Reliability
4.7 Ethical Considerations and the Handoff Contract

The evaluation is requirement-linked and proposition-linked: its unit is testable correspondence between environment requirements, design features, and declared measures, not narrative plausibility or artefact attractiveness. A single principle governs the design. Every evaluation question must trace to at least one environment requirement, at least one proposition, and at least one pre-declared measure; a question that cannot demonstrate these linkages falls outside the contribution line and does not enter the protocol. The discipline matters because Design Science Research evaluation degenerates into post hoc justification when artefact merits are argued narratively rather than tested against pre-declared criteria.²⁴ Pre-registration (declaring questions, baselines, thresholds, and adjudication rules before results are reported) prevents the thesis from adjusting its success criteria to match its outcomes. The artefact suite can therefore fail honestly, and the protocol is built to make any such failure transparent rather than deniable.

Consolidated requirements-design-evaluation traceability matrix

The requirements-design-evaluation traceability matrix below maps the requirement, design, and evaluation constructs across Chapters 2 to 5 so their alignment is demonstrable rather than assumed. Each chapter introduces criteria at a different level of abstraction (feasibility criteria, meta-requirements, propositions, evaluation questions, success criteria) and the matrix threads them into a single chain from environment to evidence.

fig-ch4-03-evaluation-traceability-map — **Evaluation Traceability Map** Network traceability map linking environment requirements (ER) through propositions (P1 to P5) and evaluation questions (EQ-01 to EQ-05) to evidence objects (EVID). Each evaluation question inherits its success criteria from the propositions and environment requirements it serves, so no evaluation claim is made without a traceable requirement origin.

Requirements-Design-Evaluation Traceability Matrix

Feasibility framing (per meta-requirement)	Ch 2 Meta-Requirement	Ch 3 Proposition	Ch 4 Evaluation Question	Ch 5 Success Criterion	Evidence Object
Reversibility	MR1 (Semantic continuity)	P1 (Interpretive convergence)	EQ-01 (Interpretive divergence)	SC-01 + SC-02	`EVID-P1-INTERPRETABILITY`
Cognitive tractability	MR2 (Bounded composability)	P2 (Interface-bounded verification)	EQ-02 (Verification scope)	SC-04	`EVID-P2-CHECK-SCOPE`, `EVID-P2-LOCAL-GLOBAL`
Coordination cost	MR3 (Formal expressibility)	P3 (Replay reliability)	EQ-03 (Round-trip preservation) [primary]; EQ-01 + EQ-02 [secondary]	SC-03	`EVID-P3-REPLAY`, `EVID-P3-INVARIANTS`
Institutional compatibility	MR4 (Procedural traceability)	P4 (Governed variation)	EQ-04 (Governed variation)	SC-05	`EVID-P4-VARIATION`
Temporal sustainability	MR5 (Evaluation-readiness)	P5 (Integrated utility)	EQ-05 (Integrated burden)	None (Chapter 10 scope)	`EVID-P5-BURDEN`
Skill cost	MR1 + MR2	P1 + P2	EQ-01 + EQ-02	SC-01 + SC-02	`EVID-P1-SDA-TRACE`

No feasibility criterion is orphaned, and no success criterion floats without a methodological anchor. The sixth row carries the cross-cutting skill-cost criterion, which maps to a composite of two meta-requirements and two questions rather than a single pair. Skill cost is composited from three observable components: lookup cost (locating the relevant clause or primitive in the serialised schema, addressed by MR1), recall cost (retaining the primitive vocabulary and bounded-composability rules across a verification episode, addressed by MR2), and application cost (applying the recalled primitives to a concrete step, addressed jointly by MR1 and MR2). Each component is rated low, moderate, or high against the matched-baseline equivalent, and the composite is the worst of the three: a conservative rule that prevents a low-lookup, low-recall artefact from masking a high application cost. The composite is reported alongside its three components, so the reader can inspect which dimension drives the classification.

Evaluation questions

Five evaluation questions govern the assessment programme. Each addresses a governance function, links to environment requirements and propositions, and is adjudicated through declared measures and evidence objects. Together they discharge the propositions that test the single architectural claim; they do not pre-empt the engine specified in Chapter 6.

EQ	Question	Linked ER	Linked Measures	Proposition Anchors	Expected Evidence
EQ-01	Does serialised semantic representation reduce interpretive divergence?	ER-01, ER-04	`EM-4W-01`, `EM-09-01`	P1 (primary), P3 (secondary)	`EVID-P1-INTERPRETABILITY`, `EVID-P1-SDA-TRACE`
EQ-02	Does interface-bounded modularity reduce verification scope?	ER-02, ER-05	`EM-4W-02`, `EM-09-02`	P2	`EVID-P2-CHECK-SCOPE`, `EVID-P2-LOCAL-GLOBAL`
EQ-03	Does the encoded artefact preserve invariants under the round-trip cycle?	ER-01, ER-02	`EM-09-01`, `EM-09-02`	P3	`EVID-P3-REPLAY`, `EVID-P3-INVARIANTS`
EQ-04	Does platform governance enable legitimate variation under shared rules?	ER-06	`EM-4W-03`, `EM-09-04`	P4	`EVID-P4-VARIATION`, `EVID-P4-EXCEPTION-BUDGET`
EQ-05	Does the integrated suite reduce practical burden while maintaining reliability?	ER-03, ER-05	`EM-09-03`	P5	`EVID-P5-BURDEN`

EQ-01 tests Proposition 1 primarily and Proposition 3 secondarily; its baseline is interpretive divergence under current synchronous-artefact workflows, and its threshold is a material reduction in divergence. EQ-02 is the primary test of Proposition 2: the baseline is monolithic representation, and the threshold is a shift from global to local scope dominance under bounded edits. EQ-03 is pre-registered on its own measures rather than inherited from EQ-01 and EQ-02, so Proposition 3’s evidence chain stands independently; the baseline is narrative or tacit change procedures, and the threshold is deterministic replay, full invariant retention, and parser rejection of invariant-violating inputs under the round-trip cycle. EQ-04 tests Proposition 4 against variation without a stable shared core, with the threshold that every variant sits within the pre-registered exception budget and every exception is typed and justified. EQ-05 is the most demanding question because it tests the integrated system rather than any single component; its operationalisation, including the Proposition 5 composite criterion, is specified in Section 4.5.

Evaluation measure specifications

The measures are pre-registered with metric, unit, baseline, threshold, data source, and validity risks; they are fixed unless a design cycle revises them with documented justification. Within-chapter measures (EM-4W-*) run during formative evaluation inside individual artefact chapters.

Measure	Metric Definition	Unit	Baseline	Threshold	Data Source	Validity Risks
`EM-4W-01`	Interpretation divergence for matched obligations across handovers	Divergence rate	Synchronous-artefact workflow divergence for equivalent obligations	Lower than baseline with practitioner-detectable effect	Within-chapter protocol logs	Coder bias; sampling limits
`EM-4W-02`	Local-to-global verification scope ratio per bounded change	Ratio	Scope ratio under current monolithic-representation workflow	Local scope dominates under bounded edits	Change-trace logs	Misclassified dependencies
`EM-4W-03`	Rule-compliant variant generation rate under exception budget	Compliance proportion	Manual exception-heavy variation process	Meets pre-registered exception budget	Generation and exception-record logs	Scenario representativeness

Summative measures (EM-09-*) run during the Chapter 10 evaluation programme.

Measure	Metric Definition	Unit	Baseline	Threshold	Data Source	Validity Risks
`EM-09-01`	Standards-interpretability ambiguity-reduction across demonstration cases (aggregate resolution rate computed from clause-level trace-completeness inputs)	Aggregate resolution rate	Baseline ambiguity profile under unserialised regulatory text	Aggregate delta ≥ 0.25 relative to baseline (design decision; see below)	Chapter 10 case-output analysis	Instrument definition drift
`EM-09-02`	Modular-fit and bounded-check performance under change scenarios	Composite score	Baseline modular-fit under monolithic representation	Improved bounded verification relative to baseline	Chapter 10 modular-fit and change-trace outputs	Comparator mismatch
`EM-09-03`	Workflow burden delta across time, cognitive load, and skill-dependency proxies	Delta (multi-dimensional)	Baseline burden profile for matched tasks	Burden reduction with declared limits; must satisfy the P5 composite criterion	Chapter 10 workflow-comparison outputs	Confounding process changes; learning effect
`EM-09-04`	Exception governance quality for generated dwelling variants	Quality score	Baseline unguided exception handling without shared rules	All exceptions typed and justified; rate within declared budget	Chapter 10 variation and exception evidence	Subjective coding risk

These specifications align with the canonical property-question-measure-evidence wiring: Proposition 1 → EQ-01 → EM-4W-01 → EM-09-01; Proposition 2 → EQ-02 → EM-4W-02 → EM-09-02; Proposition 3 → EQ-03 → EM-09-01 + EM-09-02; Proposition 4 → EQ-04 → EM-4W-03 → EM-09-04; Proposition 5 → EQ-05 → EM-09-03. The EM-09-02 indicator, modular-fit and bounded-check performance, is the pre-registered summative measure for P2; any concrete sub-metric coined downstream operationalises it and does not displace it as the canonical name. The consolidated form of this wiring, which adds the motivating property, meta-requirement, and evidence objects for each proposition, is the Canonical Property-Proposition-Evaluation Map (Appendix F.5).

The EM-09-01 definition fixes a three-term relationship to forestall metric drift. Trace completeness is the clause-level input: the proportion of obligation-relevant identity, role, deontic, lexical, mapping, and cross-channel anchors traceable from clause to evaluation output. The schema-mediated resolution rate is the derived quantity computed from those inputs against the baseline. Ambiguity-reduction is the framing-level effect that the derived quantity operationalises. This relationship holds for the duration of the programme, and the artefact chapters report against resolution rate as the operative pass/fail surface.

The EM-09-01 threshold of an aggregate resolution rate of at least 0.25 is a transparent design decision, not a statistically derived cutoff. It is anchored to Berry and Kamsties’ synthesis of inspection-based ambiguity-reduction effect sizes, which reports the smallest reduction practitioners can reliably detect through structured inspection.²⁵ Sensitivity is reported by plain counts at adjacent cut-offs rather than by any interval or significance test. The same 0.25 floor binds both Chapter 5 surfaces on which the resolution rate is reported, the chapter-internal computation and the cross-channel reconciliation, so a looser cutoff cannot govern one surface and render the headline finding non-falsifiable.

The exception budget named in the EM-09-04 threshold is a single-digit ceiling: fewer than ten typed Rule-4 exceptions across a demonstration trajectory. The ceiling is a transparent design decision rather than a statistically derived cutoff, and its rationale is architectural. A trajectory that cannot be expressed without reaching double digits of exceptions indicates that the governed kernel itself, rather than the variant layer, requires revision; the budget is therefore the point at which accumulated exceptions become evidence of kernel inadequacy. The count is reported descriptively against this ceiling, showing how far a trajectory sits below it, and the EM-09-04 verdict combines it with the typed-and-justified criterion so that a within-budget count carries only when every exception is also typed against its governance basis.

Proposition baselines are specified at the directional level in this chapter, because the methodology defines the evaluation architecture rather than domain-specific instruments. Each artefact chapter concretises a directional baseline into measured values: Chapter 5 Section 5.11 converts the EQ-01 specification into a six-dimensional empirical baseline. This two-stage choice prevents premature commitment to instruments that the design search may need to adapt.

FEDS efficacy classification and evaluation-pattern mapping

The strategy is classified as Technical Risk and Efficacy within Venable, Pries-Heje and Baskerville’s FEDS framework: the artefact suite is technically complex (formal grammar, graph topology, computational pipelines), so the principal risk is that it does not perform as specified rather than that users reject it, and the evaluation uses constructed demonstration cases that isolate technical performance from adoption dynamics.²⁶ The strategy combines formative episodes within each artefact chapter (EM-4W-*) with summative episodes in Chapter 10 (EM-09-*). The declared evaluation ceiling is therefore efficacy under constructed demonstration, not effectiveness under naturalistic field deployment; naturalistic evaluation is reserved as future work.

Sonnenberg and vom Brocke’s four-pattern taxonomy locates each episode temporally and is descriptive of position only, never a defence of evidential quality.²⁷ EVAL1 (problem identification) sits in Chapters 1 and 2; EVAL2 (solution design) in Chapters 3 and 4. EVAL3 (solution instantiation) covers artefact construction and within-chapter formative assessment; its measurement-protocol boundary admits schema-correctness, internal traceability, primitive coverage, and the EM-4W-* measures, and excludes integrated cross-artefact behaviour and any naturalistic generalisation. EVAL4 (solution in use) is the artificial-summative assessment of the integrated suite against EQ-01 to EQ-05 on two constructed demonstration cases; its boundary excludes naturalistic field deployment, longitudinal adoption, and inferential generalisation beyond the constructed scenario class.

Method choice and the evidence-balance rule

The evaluation combines three method types, each addressing a distinct evidential need. Analytical checks verify rule-level correctness (deterministic replay, invariant retention, no-orphan traceability, coverage completeness) yielding binary pass/fail evidence applied within artefact chapters. Observational traces capture change-propagation evidence: how edits propagate, what checks are triggered, and what scope ratios result. Descriptive interpretation assesses workflow-level outcomes (interpretability, tractable governance overhead, and the burden profile) as qualitative judgements bounded by the matched-scenario context. The evidence-balance rule then requires that no major Chapter 10 claim about real-world applicability rests on artificial evidence alone, and conversely that no formal-property claim rests on interpretive judgement alone.

4.5 Adjudication, Thresholds, and the P5 Composite Criterion

Chapter 4: Methodology

4.1 Introduction: From Theory Contract to Method Contract
4.2 Research Paradigm and the Inheritance from Theory
4.3 Design Cycles, the Data Pipeline, and the Census Basis of Evidence
4.4 Evaluation Questions and Measures
4.5 Adjudication, Thresholds, and the P5 Composite Criterion (this section)
4.6 Validity and Reliability
4.7 Ethical Considerations and the Handoff Contract

Each evaluation measure carries an explicit baseline and threshold declared before execution, and each result is classified into one of four categories rather than a binary verdict. A measure passes when the threshold is met with supporting evidence from at least one linked EVID-* object. It fails when the threshold is not met and no boundary explanation accounts for the shortfall. It is indeterminate when the evidence is insufficient to classify the result as pass or fail: for example, an inadequately specified matched baseline, inconclusive instrument data, or inadequate coverage of the relevant cases. It is a boundary-limited acceptance when the threshold is partially met under a declared scope constraint that explains the boundary: for example, a measure passing within the tested scenario class but not assessable for untested scenarios. Every major claim in Chapter 10 must cite at least one linked evidence object; claims without evidence linkage are interpretive commentary, not contribution assertions. The four-result scheme is non-standard in design science research, and the departure is deliberate. Binary pass/fail classification suits evidence that is unambiguous and fully controlled. A multi-artefact thesis addressing a complex governance problem instead produces partial, boundary-conditioned, or mixed evidence, and the indeterminate and boundary-limited categories let the evaluation report that honestly rather than force premature classification. These are diagnostic categories, not escape hatches: each signals where evidence falls short and what further work would resolve it.

The chapter-level thresholds are directional rather than numerical. They pre-register the direction of expected effects (reduction, improvement, bounded scope) without committing to a magnitude. The justification is methodological. The thesis evaluates a nascent design theory at what Gregor and Hevner term a Level 2 contribution, applying established theoretical foundations to a first formalisation of accessibility-standard clause governance.²⁸ Prior requirements-ambiguity work establishes that structured inspection can reduce ambiguity to a practitioner-detectable degree, and it is on that benchmark that the instance-level thresholds below are anchored; what it does not supply is a calibrated effect-size baseline for clause-governance schemas of this specific kind. For a first formalisation, directional pre-registration is therefore the appropriate posture: an arbitrary numerical cutoff (say, a required 30% ambiguity reduction) would impose a false precision the evidence base does not support, either too lenient or too strict. What the thesis does pre-register remains substantial: the direction of effect for each evaluation question; the evaluation instruments, with defined metrics, units, baselines, and data objects; the named evidence objects linked to each proposition; and the adjudication logic, comprising the four-result scheme and the Proposition 5 composite criterion. Pre-registration covers the evaluation architecture, not the effect sizes, and it retains analytical teeth when the instruments are precise and the adjudication is honest.

Numerical thresholds as transparent design decisions

Where citation-grounded numerical thresholds can be derived from prior empirical work, three are pre-registered for the instance-level measures in the Chapter 5 standardisation-schema evaluation. They are transparent design decisions anchored to a published benchmark, not statistically derived cutoffs, and their sensitivity is shown by plain counts at adjacent cut-offs rather than by any inferential apparatus.

Aggregate resolution rate ≥ 0.25, interpretable as a 25-point reduction on the unified ambiguity index. This is the smallest reduction that Berry and Kamsties’ synthesis of inspection-based ambiguity reduction reports as practitioner-detectable through structured inspection rather than aggregation.²⁹ A per-dimension floor of ≥ 0.10 binds the three resolution-mode dimensions (Identity, Structural, Deontic), preventing one dominant dimension from carrying the headline number; the two dual-mode dimensions (Lexical polysemy and Mapping primitives) discharge instead through a governed-transparency criterion, the schema codifying them as auditable, confidence-labelled design objects.
Holdout vocabulary coverage ≥ 80%, the lower-bound comparable target for domain-restricted vocabulary transferability under bounded-schema conditions.³⁰ Below 80% the primitive set fails to absorb new corpus material without schema revision, which would falsify the transferability claim.
Polysemy proportion ≥ 0.50 at the headline lexical layer: the disambiguation contribution is warranted only where the corpus carries enough lexical polysemy to require governance, so this is a floor the corpus must clear, not a ceiling it must stay under. The 0.50 floor is comparable to the technical-domain regulatory-text polysemy reported in prior requirements-ambiguity work.³¹ This floor is the operative form of the criterion, consistent with the design-floor framing at Chapter 5, Section 5.8. Sensitivity is reported descriptively as the counts of clauses falling at the adjacent cut-offs of 0.50, 0.55, and 0.60, so that the reader can see how far the corpus sits above the floor without any resampling or interval claim.

The ≥ 0.25 aggregate resolution rate threshold is binding across both Chapter 5 measurement surfaces, the chapter-internal per-clause and aggregate computation in Section 5.20 and the cross-channel reconciliation in Section 5.25, so that no looser cutoff governs the cross-channel measure (XCC-INT-03) and renders the headline finding non-falsifiable. The discrimination this register supports is descriptive, not inferential: the per-clause and aggregate deltas, the null-model contrast, and the reduced-schema layer analysis are read as descriptive comparisons that show where the framework differentiates conditions, consistent with the census-not-sample evidential basis established in Section 4.3.

The P5 composite criterion

Proposition 5 (integrated utility under diachronic burden) requires improvement on a composite criterion (reduced practical burden and maintained or improved assurance reliability) rather than on any single metric, and it is operationalised here for direct application in Chapter 10. The operationalisation discharges the obligation inherited from Chapter 3 Section 3.5, where the falsifiable floor and the learning-effect confound were identified but their procedural resolution deferred.

Materiality threshold. A material improvement is one detectable by a practitioner in a matched-scenario comparison. The standard is framed in practical, not statistical, terms because the demonstration uses constructed demonstration cases for which statistical aggregation would be inappropriate. The evaluator records, for both the artefact-supported and the matched-baseline workflow on the same scenario, three proxies: verification-step count, decision points requiring interpretive judgement, and elapsed workflow time. When the artefact-supported workflow shows a reduction on at least two of the three proxies, visible in the scenario-level data without aggregation, the difference is classified as material. The standard is conservative (it may classify genuinely small improvements as non-material), but conservatism suits a nascent theory’s first empirical claims.

Composite adjudication rule. Both burden and reliability must be non-negative relative to the matched baseline, and at least one must improve. The result is pass (burden-led) when burden improves while reliability holds or improves; pass (reliability-led) when reliability improves while burden holds; fail when either component is purchased at the cost of the other, or when neither improves; and indeterminate when the evidence is insufficient to classify either component. This rule avoids any arbitrary weighting of burden against reliability: it does not assert that a burden gain compensates for a reliability loss, because such trade-off ratios cannot be justified a priori.

Learning-effect separation protocol. Burden reduction after the artefact suite is introduced may partly reflect familiarity with any new workflow rather than the governance mechanism. To distinguish the two, the matched-baseline workflow receives equivalent exposure time. Reduction in the artefact condition but not in the matched baseline supports the mechanism claim; comparable reduction in both flags the learning-effect confound and classifies Proposition 5 as indeterminate for mechanism attribution; a substantially larger reduction in the artefact condition partially supports the mechanism, with the learning-effect contribution explicitly quantified. Separating mechanism from familiarity in this way presupposes an arbiter independent of the artefact’s construction; under the constructed-demonstration scope of Chapter 10, where the candidate is the sole arbiter and no external practitioner comparator is available, the separation cannot be run, so mechanism attribution is reported as indeterminate and the integrated-utility verdict is bounded by the absence of that external comparator. The protocol does not eliminate the confound (full elimination would require longitudinal controlled work beyond a single doctoral study), but it manages the confound by making it visible and classifiable.

Non-negotiable falsifiable floor. From Chapter 3 Section 3.5: the artefact suite must not increase verification burden for equivalent assurance quality. If governance overhead exceeds the verification savings it generates, the platform claim fails regardless of any other technical merit. This floor tests the thesis’s core value proposition, that explicit representational governance reduces rather than increases the cost of housing-adaptation verification. A positive Proposition 1, Proposition 2, Proposition 3, and Proposition 4 result combined with a failing Proposition 5 would mean the individual artefacts work as designed while the integrated system imposes more overhead than it saves: an honest and informative finding, but not one that would support the prescriptive design-theory claim.

Proposition-level aggregation: the promotion scheme

The four-result scheme classifies each measure. A second register operates one level up, at the proposition, because a proposition is carried by several measures across several result domains, and its standing is a judgement about their convergence rather than a reading of any single measure. Chapter 10 applies both registers: it adjudicates result domains at the measure level under the four-result scheme, then aggregates those domains to the proposition level under the promotion rule defined here. The two registers are complementary, and the proposition-level register is the one that Chapter 11 and Chapter 12 inherit as authoritative for contribution claims.

A proposition carries one of four statuses. It is SUPPORTED at HIGH confidence when converging result domains carry the property within the trajectory’s scope and the property’s deciding axis is exercised to its full extent across the regulatory expansion without schema-level or kernel modification. It is SUPPORTED at MODERATE confidence when that converging support holds but rests on a structurally narrower corpus, such as a single-fork trajectory rather than a multi-trajectory cohort, that a broader demonstration could still overturn; the grade records breadth of exercise, and the narrowing is registered as a declared future-work item. It is DECLARED-LIMITED when the positive evidence is genuine but bounded, whether by inheritance from an earlier chapter or by the candidate-driven scope of the demonstration, with the boundary stated explicitly and routed to future work. It is REJECTED when a measure records unambiguous threshold non-attainment and no boundary explanation accounts for the shortfall.

The line between the last two statuses is the recoverable boundary. A DECLARED-LIMITED verdict attaches an explicit, recoverable scope limit to favourable evidence, whereas a REJECTED verdict marks threshold failure with no such boundary. A disconfirming outcome that carries a recoverable boundary, such as an inherited round-trip warrant or a candidate-driven scope limit, is therefore logged as a boundary condition rather than a rejection, and rejection is held in reserve for the unambiguous failure its name implies. The label DECLARED-LIMITED denotes a proposition verdict here, distinct from the identically named data-limitation code applied to corpus findings in Chapter 8.

The promotion rule fixes how measure-level verdicts aggregate. A domain verdict promotes to SUPPORTED where a property is carried by converging result domains within the trajectory’s scope, at HIGH or MODERATE confidence according to the breadth criterion above, and to DECLARED-LIMITED where its evidence is bounded by inheritance or by candidate-driven scope. A measure-level boundary-limited acceptance is therefore an input to promotion rather than a proposition verdict in itself: the rule reads it together with its converging domains before a proposition status is assigned.

A contribution-bridge condition governs what crosses from the evaluation into the interpretation of contributions in Chapter 11. Only a claim with linked EM-09-* evidence and a declared validity limit is eligible for promotion; a SUPPORTED-at-HIGH verdict is eligible for unrestricted contribution-claim status; a DECLARED-LIMITED verdict is eligible only with its boundary carried forward; and no verdict authorises an unbounded claim. The condition binds the strength of a contribution claim to the grade and the boundary of the verdict that warrants it, so that the confidence a proposition earns at adjudication is the confidence it retains when it is read as a contribution.

Continuity commitments and temporal comparability

Four downstream continuity obligations ensure that Chapter 10 cannot silently omit evaluation responsibilities; they are binding method commitments, not optional targets. The Proposition 1 interpretability protocol requires Chapter 10 to report interpretive-divergence evidence using the EQ-01 protocol. The Proposition 2 change-scope metric requires it to report local-to-global verification scope ratios using the EQ-02 metric. The Proposition 4 variation-validity protocol requires exception-governance evidence using the EQ-04 protocol. The Proposition 5 comparative burden design requires burden and reliability evidence using the composite design above. If Chapter 10 does not address a declared obligation, the omission must be explicitly justified as a scope limitation rather than left unmentioned.

Two temporal-comparability specifications apply beyond the Proposition 5 matched baseline, discharging the requirement in Chapter 3 Section 3.6 to define a comparable starting state, intervention scope, and verification burden across change episodes. For Proposition 2 (EQ-02), the demonstration must compare verification scope before and after a bounded change that modifies elements within a single module without altering its interface contracts; where starting states differ materially in complexity, scope ratios are normalised against the declared module count. For Proposition 4 (EQ-04), the demonstration must compare variation governance with and without the platform architecture, assessing whether a requested legitimate variant is produced within the exception budget; where variation demand differs materially between scenarios, exception rates are reported relative to the declared variation space. Section 4.6 specifies the validity and reliability controls that protect these measures and evidence objects from the principal threats to their credibility.

4.6 Validity and Reliability

Chapter 4: Methodology

4.1 Introduction: From Theory Contract to Method Contract
4.2 Research Paradigm and the Inheritance from Theory
4.3 Design Cycles, the Data Pipeline, and the Census Basis of Evidence
4.4 Evaluation Questions and Measures
4.5 Adjudication, Thresholds, and the P5 Composite Criterion
4.6 Validity and Reliability (this section)
4.7 Ethical Considerations and the Handoff Contract

Validity and reliability controls protect the Section 4.4 measures and evidence objects from the principal threats to their credibility. Section 4.4 specified five evaluation questions with pre-registered measures, baselines, and adjudication rules; the question now turns from what the evaluation tests to whether the testing instruments are defensible and whether the conclusions drawn from them are warranted given the conditions under which evidence was collected. We organise threats across three layers: technical validity (does the artefact behave as specified?), purpose validity (does it improve intended outcomes in context?), and instrument validity (are the measures defensible with their limits made explicit?). This three-layer structure is triangulated across three traditions rather than dependent on one. The DSR-specific literature contributes Larsen, Baskerville and Venable’s five-type framework, which distinguishes design, internal, construct, external, and statistical-conclusion validity for IS artefacts.³² The quasi-experimental tradition contributes Cook and Campbell’s canonical four-type taxonomy, which remains the reference for threats in field-setting evaluation.³³ The general research-methods tradition contributes Trochim and Donnelly’s scaffolding for validity reasoning across qualitative and quantitative designs.³⁴ Convergence across three independent traditions guards against two risks that a recent-only citation would carry: the novelty risk that the Larsen et al. taxonomy has not yet accumulated independent confirmation, and the translation risk that DSR-specific labels may not align cleanly with the canonical labels examiners use.

The three-layer model collapses Larsen et al.’s five-type decomposition, absorbing design and external validity into purpose validity and statistical-conclusion validity into instrument validity. The collapse is justified, not convenient: in this thesis’s evaluation envelope no evidential conclusion in Chapter 10 turns on which merged sub-type is in play. Consider the strongest test case, the EQ-05 burden-reduction claim evaluated on a constructed demonstration case. A design-validity, external-validity, or statistical-conclusion failure all resolve through the same instruments (the matched-baseline protocol, the pre-registered measure threshold with its practitioner-detectable materiality criterion, and the four-result adjudication scheme) and all produce the same mitigation. Studies that rely on inferential statistics with effect-size claims over a probability sample would warrant the full five-type decomposition; this thesis does not. Its corpus is a census and its demonstration uses constructed cases, so the collapse is a justified scope-fitting choice rather than a concealment of distinctions that would alter outcomes.

fig-ch4-05-validity-threat-matrix — **Validity Threat Matrix** The three-layer validity framework. Technical validity targets parser drift and dependency misclassification through deterministic replay checks; purpose validity targets case mismatch and contextual confounders through matched-baseline protocols; instrument validity targets coding bias and comparator instability through pre-registered thresholds, dual coding, and explicit uncertainty reporting.

Validity-layer controls. The table below links each layer to its measures, principal threats, and mitigation strategy.

Validity Layer	Core Question	Linked Measures	Principal Threats	Mitigation Strategy
Technical validity	Does the artefact behave as specified under declared operations?	`EM-4W-02`, `EM-4W-03`, `EM-09-02`	Parser or configuration drift; dependency misclassification	Deterministic replay checks; versioned interface controls
Purpose validity	Does the artefact improve intended outcomes in context?	`EM-09-01`, `EM-09-03`, `EM-09-04`	Case mismatch; contextual confounders	Matched-baseline protocol; scoped interpretation rules
Instrument validity	Are the measures defensible with limits made explicit?	`EM-4W-01` to `EM-09-04`	Coding bias; comparator instability; subjective scoring	Pre-registered thresholds; dual coding; uncertainty reporting

Technical validity guards against computational components diverging from their specifications, mitigated by deterministic replay and versioned interfaces; the standardisation schema’s reproducibility test in Chapter 5 Section 5.14 demonstrates both across Python 3.12 and 3.13. Purpose validity is the layer most relevant to Proposition 5, where the matched-baseline and learning-effect protocols from Section 4.4 are the primary instruments against case mismatch and contextual confounders. Instrument validity guards against systematic measurement bias, mitigated by pre-registered thresholds, dual coding, and uncertainty reporting on every evidence object.

Per-evidence threat register. Each evidence object faces specific threats that require targeted mitigation beyond the layer-level controls. EVID-P1-INTERPRETABILITY (semantic continuity) faces coder disagreement and coverage risk; mitigation is dual coding with inter-coder agreement and coverage reporting relative to the 611-clause corpus. EVID-P2-CHECK-SCOPE (bounded verification) faces hidden dependency escalation; deterministic replay traces change propagation against declared interface boundaries. EVID-P3-REPLAY (transformation replay) faces replay-boundary risk, where outputs deterministic within the tested input space may not hold at uncovered boundary conditions; mitigation is boundary-condition testing with declared input-space coverage. EVID-P4-VARIATION (governed variation) faces representativeness risk; scoped interpretation rules declare the tested and untested variation spaces, and exception budgeting types and justifies every governance exception. EVID-P5-BURDEN (integrated burden) faces a confounding process-change risk: the learning-effect confound, whereby observed burden reduction may reflect workflow novelty rather than the governance mechanism; mitigation is a matched baseline with equivalent exposure time, a pre-registered materiality threshold, and explicit classification of mechanism-caused versus familiarity-caused effects.

Three concrete criteria convert these layer-level controls into binding control surfaces, calibrated to the linked measures and to dual-coding and boundary-coverage conventions in the IS validity literature.³⁵³⁶ They pre-empt the common examiner critique that mitigations are stated generically by specifying numerical floors, a coverage threshold, and one additional per-threat mitigation.

Inter-coder agreement floor (Cohen’s κ). Wherever an evaluation step depends on human classification, the design specifies a Cohen’s κ agreement floor of 0.6 for primary measures and 0.4 for secondary measures. This is a reliability design decision (an inter-coder agreement threshold, a descriptive consistency check between two coders), not an inferential test, and it carries no associated significance claim. The 0.6 floor corresponds to the substantial-agreement band; primary measures are those whose pass/fail status carries a proposition-level claim (EM-09-01, EM-09-02, EM-09-03, EM-09-04). The 0.4 floor corresponds to the moderate-agreement band acceptable for the within-chapter EM-4W-* measures and subordinate dimensions. Below floor, an affected EVID-* object would be classified indeterminate under the four-result scheme rather than interpreted under the lower threshold, triggering a refine decision or a boundary-limited acceptance with the κ value reported. These floors specify a protocol for a future dual-coding exercise: the present study is conducted by a single researcher and does not execute external dual coding, so no κ is reported. The single-rater scope is stated as a material limitation in Chapter 11, Section 11.6, and the dual-coding exercise is declared future work; the floors are fixed in advance and would not be adjusted post hoc once applied.

Boundary-coverage threshold (stratum-cell visitation). Wherever an evaluation depends on coverage of a structured corpus, we pre-register an 80% boundary-coverage threshold: at least 80% of the corpus’s stratum cells must be visited (predicate type × deontic force × design category for the SDA standard, module-pair × bounded-edit class for modular-fit, design category × variant axis for variation). This is a coverage threshold, descriptive of which stratum cells the evaluation has visited, not a sampling inference over an unobserved population; the SDA corpus is a census of all 611 clauses, so the criterion governs completeness of visitation rather than representativeness of a draw. Coverage below 80% triggers a declared scope constraint that bounds the claim to the visited stratum cells and lists the unvisited cells. Where strata are not pre-defined (for example, novel exception types surfaced during Cycle 4), the post-hoc stratification is reported and the threshold re-checked against it.

Per-threat additional mitigations. Each threat class receives one further mitigation beyond the layer controls. Technical-validity threats add a container or environment manifest with an alternate-environment smoke test, exemplified by the Python 3.12/3.13 stress test in Chapter 5 Section 5.14 and binding for Chapters 6 to 8. Purpose-validity threats add a negative-case control, a constructed scenario in which the artefact is not expected to help, so observed effects are distinguished from generic enthusiasm. Instrument-validity threats add publication of the coding rubrics, decision rules, and the full disagreement register as appendix material, operationalising the Cook-and-Campbell convention of reporting unresolved threats rather than concealing them.

The construct of verification debt (the cumulative cost of deferred, avoided, or inadequately executed verification) is grounded in two established traditions so that it functions as an evaluative construct rather than an ad hoc invention. The information-quality literature establishes that data-quality degradation imposes rework, error-propagation, and decision-distortion costs;³⁷³⁸ deferred verification propagates comparable uncertainty through subsequent decisions. The coordination-cost literature establishes that information asymmetry and interface complexity impose transaction costs on multi-actor processes;³⁹⁴⁰ each handover lacking explicit interface contracts adds overhead the next actor absorbs. This dual grounding, carried forward as a binding commitment from Chapter 9 into Chapter 10, satisfies the construct-validity requirement.

Reliability is defined as repeated-execution consistency at two levels. Computationally, pipelines, parsers, and transformation operations must produce deterministic outputs under declared settings, verified by the replay checks in the technical-validity layer. Analytically, coding and classification must produce consistent results across coders or occasions, verified by dual coding and inter-coder agreement in the instrument-validity layer. A procedure that fails the reliability test is reported, not suppressed, and its evidence is reclassified in adjudication. Single-researcher reflexivity is a declared and bounded limitation (the same person designs, executes, and adjudicates), mitigated by three controls: pre-registration prevents post hoc criterion adjustment; the Section 4.7 ethical reporting rule prevents suppression of negative results; and dual coding on interpretive evidence such as EVID-P1-INTERPRETABILITY provides inter-coder checks independent of the primary researcher. Throughout, the thesis draws no inferential-statistical conclusion over its census and advances no sample-based effect-size claim; the κ and coverage thresholds remain descriptive design decisions, consistent with the census-not-sample governance of Section 4.3.

4.7 Ethical Considerations and the Handoff Contract

Chapter 4: Methodology

4.1 Introduction: From Theory Contract to Method Contract
4.2 Research Paradigm and the Inheritance from Theory
4.3 Design Cycles, the Data Pipeline, and the Census Basis of Evidence
4.4 Evaluation Questions and Measures
4.5 Adjudication, Thresholds, and the P5 Composite Criterion
4.6 Validity and Reliability
4.7 Ethical Considerations and the Handoff Contract (this section)

Ethical responsibility in this research is operationalised through the environment requirements rather than discharged as a perfunctory compliance section. No human participants, personal data, or organisational case studies requiring ethics-committee approval are involved. The primary data source is the publicly available NDIS Specialist Disability Accommodation Design Standard, a regulatory instrument published by the Australian Government.⁴¹ The evaluation programme in Chapter 10 uses constructed demonstration cases built on hypothetical dwelling configurations, not real dwellings or occupants, and conducts no interviews, surveys, or assessments involving identifiable individuals. The ethical obligations that remain are therefore artefact-quality and stakeholder-consequence obligations rather than consent and privacy obligations. The standardisation schema, modular library, notation grammar, and procedural pipeline each claim to improve housing-adaptation governance, and so carry responsibility for the quality of their outputs and the consequences of their adoption. Should those artefacts mislead practitioners, obscure genuine compliance requirements, or create new forms of exclusion, the consequences are ethically significant even where no participant is directly harmed.

Stakeholder consequences across the environment requirements

The stakeholder groups whose interests turn on the artefact suite’s success or failure are identified through the environment requirements, ER-01 to ER-06. The table below maps each requirement to its stakeholder focus, the ethical concern at stake, the risk that arises if the concern is left unmanaged, and the mitigation commitment carried by the research design.

ER	Stakeholder Focus	Ethical Concern	Risk if Unmanaged	Mitigation Commitment
ER-01	Households and residents	Life-course burden can be misrepresented	Interventions optimise paperwork over adaptability	Burden-sensitive measures and context notes
ER-02	Practitioners and assessors	Tacit interpretation load hides cognitive burden	Unsafe or overly conservative decisions	Explicit invariants and interface publication
ER-03	Project owners and providers	Utility narratives may overclaim causality	Misleading utility claims	Comparator discipline and bounded interpretation
ER-04	SDA participants and assessors	Feature-level pass can mask integrated usability failure	Exclusionary outcomes despite nominal compliance	Relation-level interpretability reporting
ER-05	Design and review teams	Specialist dependence may become structural inequity	Adaptation locked behind scarce expertise	Trigger-to-check transparency and delegation support
ER-06	Governance and maintenance actors	Exception opacity can normalise drift	Unjustified deviation from shared rules	Typed exception budgets with justification

Three requirements warrant comment beyond the table. The most ethically consequential group is ER-01, the people for whom the housing exists: residents who require accessible dwellings that adapt to changing needs. The risk is that a standardisation schema accelerates administrative compliance while leaving genuine adaptability untouched. The mitigation binds the workflow-burden measure (EM-09-03) to adaptation outcomes, treating documentation efficiency as secondary; if the artefact suite makes compliance faster without improving the adaptability of dwelling outcomes, that finding must be reported as such in Chapter 10 and Chapter 11. ER-04 raises a distinct concern: a schema that achieves feature-level compliance, satisfying each requirement in isolation, can still fail relational integrity, where the configuration of features does not support the intended function. The evaluation addresses this through relation-level interpretability assessment and through the Proposition 5 composite criterion, which requires the integrated system to demonstrate practical utility rather than component-level success alone. ER-06 concerns governance drift: the platform architecture permits legitimate variation through governed exceptions, but exception budgets can normalise drift if justified departures accumulate without review and erode the framework. The mitigation requires every exception to be typed and justified, with governance quality assessed as a summative measure (EM-09-04).

Alignment with the Myers and Venable principles

Six ethical principles for Design Science Research provide the normative frame for the thesis’s posture, and each is either operationalised within the method contract or set aside for cause.⁴² Assess benefits and harms is operationalised through the stakeholder-ER mapping above: each requirement names a group and the harms that artefact failure could produce, and the evaluation weighs intended benefits against potential harms. Informed consent is not applicable, since the research involves no human participants and draws on public regulatory text and constructed cases. Privacy safeguards are likewise not applicable, as no personal data is collected, processed, or stored. Honesty and integrity are operationalised through the ethical reporting rule set out below and through the pre-registration discipline that forecloses post hoc adjustment of success criteria. Intellectual property raises no conflict: the standard is public, and the analytical methods, schema, and notation grammar are original work. Artefact quality and safety is the most substantively applicable principle here, operationalised through the validity controls in Section 4.6, the falsifier discipline in Section 4.5, and the stakeholder-consequence analysis above; an artefact that claims to govern housing-adaptation standards carries quality and safety obligations that hold even in a research context.

The ethical reporting rule follows directly. Retention of negative results and accepted limitations across the Chapter 10 interpretation and the Chapter 11 synthesis is a binding obligation: when a proposition fails, the failure is reported with its diagnostic implications; indeterminate or boundary-limited evidence is surfaced under that classification rather than suppressed; and any governance overhead the artefact suite introduces unexpectedly is documented. The stakeholders named above are better served by honest failure than by concealed limitation.

The handoff contract

The method commitments established in Sections 4.2 to 4.6 bind the chapters that follow, and the cross-reference below records the resolution of the six methodological elements inherited from Chapter 3 Section 3.6, each now carried by a specific construct.

Inherited Element	Resolved In	Chapter 4 Construct
Unit of change	Section 4.4 design-feature layer	Design features `DF-05A-01` to `DF-08D-02`
Invariant format	Section 4.5 measure specifications	Evaluation measures `EM-4W-01` to `EM-09-04`
Trigger-to-check mapping	Section 4.3 cycle table	Cycles 1 to 4 with evaluation steps
Baseline	Section 4.5 comparator definitions	Matched-baseline conditions per EQ
Evidence objects	Section 4.5 evidence linkage; Section 4.6 threat register	`EVID-P1` through `EVID-P5`
Adjudication rules	Section 4.5 adjudication and P5 operationalisation	Four-result scheme; P5 composite criterion

Four downstream obligations are fixed. Artefact construction binds Chapters 5 to 9 to instantiate the design features under explicit mechanism objects, to demonstrate that those features satisfy their linked environment requirements, to execute Cycles 1 to 4 with documented intent, change, evaluation, and decision, and to produce the evidence objects specified in the cycle table. Summative evaluation binds Chapter 10 to execute evaluation questions EQ-01 to EQ-05, report measures EM-09-01 to EM-09-04 with linked EVID-* objects, address the four evaluation-continuity commitments, cite at least one evidence object per major claim, and apply the Proposition 5 operationalisation without modification. Contribution extraction binds Chapter 11 to produce the contribution rules TR-01 to TR-04 as scoped prescriptive outputs that remain within the evidence boundaries the evaluation establishes. Bounded transfer binds Chapter 12 to honour the three Chapter 3 boundary clarifications: representational governance does not resolve plural value conflict; modular and platform arrangements are conditional on governance quality; and no unrestricted transfer beyond the declared scope is claimed.

Each contribution rule carries an anticipated output form under the March and Smith taxonomy, specified here so that the contribution claim is not, in Chapter 11, post-hoc rationalised toward the most flattering category.⁴³ TR-01, the semantic-continuity rule for regulatory-text serialisation, is anticipated as a method with secondary construct properties. TR-02, the interface-bounded modularity rule for governance-domain artefacts, is anticipated as a model with secondary construct properties. TR-03, the executable-grammar rule for replayable transformations under invariants, is anticipated as a method with secondary instantiation properties, demonstrated through the RecPol/PlaniSyn instantiation in Chapter 7. TR-04, the platform-governance rule for legitimate variation under shared rules, is anticipated as a model with secondary method properties, paired with an exception-budgeting method. On our reading, these four are not four independent contributions: they are the anticipated forms through which a single architectural claim is discharged, Propositions 1 to 4 warranting claims at the model and method level rather than at the lexical or implementation level. Chapter 11 is bound to honour these forms unless the evidence base forces a downgrade, in which case the downgrade is declared rather than silently presented.

The shared vocabulary anchors carried forward to every subsequent chapter are the argument spine SP-01 to SP-04; the environment requirements ER-01 to ER-06; the evaluation measures EM-4W-01 to EM-09-04; the contribution rules TR-01 to TR-04; the evidence objects EVID-P1-INTERPRETABILITY through EVID-P5-BURDEN; and the adjudication categories of pass, fail, indeterminate, and boundary-limited acceptance. A completion condition keeps the contract binding throughout the evaluation: should Chapter 10 meet a measurement situation the pre-registered protocol does not anticipate, that condition must be declared as a scope limitation rather than resolved by inventing ad hoc criteria. The first artefact chapter to instantiate the contract is Chapter 5, which applies design features DF-05A-01 to DF-05A-03 to address ER-01, ER-03, and ER-04 through the standardisation schema for the SDA Design Standard, accepting the pipeline, cycle structure, evaluation protocol, validity controls, and ethical obligations defined here.

Next chapter: Chapter 5: A Queryable Schema for Accessibility Standards →

Notes

K. Peffers, T. Tuunanen, M. A. Rothenberger, and S. Chatterjee, “A Design Science Research Methodology for Information Systems Research,” Journal of Management Information Systems, vol. 24, no. 3, pp. 45-77, 2007, doi: 10.2753/MIS0742-1222240302. ↩︎
A. R. Hevner, S. T. March, J. Park, and S. Ram, “Design Science in Information Systems Research,” MIS Quarterly, vol. 28, no. 1, pp. 75-105, 2004. ↩︎
K. Peffers, T. Tuunanen, M. A. Rothenberger, and S. Chatterjee, “A Design Science Research Methodology for Information Systems Research,” Journal of Management Information Systems, vol. 24, no. 3, pp. 45-77, 2007, doi: 10.2753/MIS0742-1222240302. ↩︎
R. L. Baskerville, “Investigating Information Systems with Action Research,” Communications of the Association for Information Systems, vol. 2, no. 1, Article 19, 1999. ↩︎
R. K. Yin, Case Study Research and Applications: Design and Methods, 6th ed. Thousand Oaks, CA: Sage, 2018. ↩︎
J. G. Walls, G. R. Widmeyer, and O. A. El Sawy, “Building an Information System Design Theory for Vigilant EIS,” Information Systems Research, vol. 3, no. 1, pp. 36-59, 1992, doi: 10.1287/isre.3.1.36. ↩︎
J. F. Nunamaker, M. Chen, and T. D. M. Purdin, “Systems Development in Information Systems Research,” Journal of Management Information Systems, vol. 7, no. 3, pp. 89-106, 1991, doi: 10.1080/07421222.1990.11517898. ↩︎
W. Kuechler and V. Vaishnavi, “On Theory Development in Design Science Research: Anatomy of a Research Project,” European Journal of Information Systems, vol. 17, no. 5, pp. 489-504, 2008, doi: 10.1057/ejis.2008.40. ↩︎
A. R. Hevner, “A Three Cycle View of Design Science Research,” Scandinavian Journal of Information Systems, vol. 19, no. 2, pp. 87-92, 2007. ↩︎
S. Gregor and A. R. Hevner, “Positioning and Presenting Design Science Research for Maximum Impact,” MIS Quarterly, vol. 37, no. 2, pp. 337-355, 2013, doi: 10.25300/MISQ/2013/37.2.01. ↩︎
S. Gregor, “The Nature of Theory in Information Systems,” MIS Quarterly, vol. 30, no. 3, pp. 611-642, 2006, doi: 10.2307/25148742. ↩︎
S. T. March and G. F. Smith, “Design and natural science research on information technology,” Decision Support Systems, vol. 15, no. 4, pp. 251-266, 1995, doi: 10.1016/0167-9236(94)00041-2. ↩︎
J. Iivari, “Distinguishing and Contrasting Two Strategies for Design Science Research,” European Journal of Information Systems, vol. 24, no. 5, pp. 471-478, 2015, doi: 10.1057/ejis.2013.35. ↩︎
J. R. Venable, “The Role of Theory and Theorising in Design Science Research,” in Proc. DESRIST 2006, pp. 1-18, 2006. ↩︎
M. K. Sein, O. Henfridsson, S. Purao, M. Rossi, and R. Lindgren, “Action Design Research,” MIS Quarterly, vol. 35, no. 1, pp. 37-56, 2011. ↩︎
A. R. Hevner, “A Three Cycle View of Design Science Research,” Scandinavian Journal of Information Systems, vol. 19, no. 2, pp. 87-92, 2007. ↩︎
K. Peffers, T. Tuunanen, M. A. Rothenberger, and S. Chatterjee, “A Design Science Research Methodology for Information Systems Research,” Journal of Management Information Systems, vol. 24, no. 3, pp. 45-77, 2007, doi: 10.2753/MIS0742-1222240302. ↩︎
A. R. Hevner, S. T. March, J. Park, and S. Ram, “Design Science in Information Systems Research,” MIS Quarterly, vol. 28, no. 1, pp. 75-105, 2004. ↩︎
A. R. Hevner, S. T. March, J. Park, and S. Ram, “Design Science in Information Systems Research,” MIS Quarterly, vol. 28, no. 1, pp. 75-105, 2004. ↩︎
See Appendix: RDE Traceability Matrix. ↩︎
National Disability Insurance Agency, NDIS Specialist Disability Accommodation Design Standard, Australian Government, 2019. ↩︎
Australian Bureau of Statistics, Census of Population and Housing 2021: Housing, Australian Government, 2022. ↩︎
The benchmark survey of comparable Design Science Research theses finds descriptive-only quantitative reporting; the closest descriptive-only peers are Hjelseth (2015), Solihin (2016), and Khudhair (2022). ↩︎
A. R. Hevner, S. T. March, J. Park, and S. Ram, “Design Science in Information Systems Research,” MIS Quarterly, vol. 28, no. 1, pp. 75-105, 2004. ↩︎
D. M. Berry and E. Kamsties, “Ambiguity in requirements specification,” in Perspectives on Software Requirements, J. C. S. P. Leite and J. H. Doorn, Eds. Boston, MA: Springer, 2004, pp. 7-44, doi: 10.1007/978-1-4615-0465-8_2. ↩︎
J. R. Venable, J. Pries-Heje, and R. Baskerville, “FEDS: A Framework for Evaluation in Design Science Research,” European Journal of Information Systems, vol. 25, no. 1, pp. 77-89, 2016, doi: 10.1057/ejis.2014.36. ↩︎
C. Sonnenberg and J. vom Brocke, “Evaluations in the Science of the Artificial – Reconsidering the Build-Evaluate Pattern in Design Science Research,” in DESRIST 2012, Lecture Notes in Computer Science, vol. 7286, Springer, 2012, pp. 381-397. ↩︎
S. Gregor and A. R. Hevner, “Positioning and presenting design science research for maximum impact,” MIS Quarterly, vol. 37, no. 2, pp. 337-355, 2013. ↩︎
D. M. Berry and E. Kamsties, “Ambiguity in requirements specification,” in Perspectives on Software Requirements, J. C. S. P. Leite and J. H. Doorn, Eds. Boston, MA: Springer, 2004, pp. 7-44, doi: 10.1007/978-1-4615-0465-8_2. ↩︎
E. Kamsties and D. M. Berry, “Detecting ambiguities in requirements documents using inspections,” in Workshop on Inspection in Software Engineering (WISE 2001), 2001. ↩︎
D. M. Berry and E. Kamsties, “Ambiguity in requirements specification,” in Perspectives on Software Requirements, J. C. S. P. Leite and J. H. Doorn, Eds. Boston, MA: Springer, 2004, pp. 7-44, doi: 10.1007/978-1-4615-0465-8_2. ↩︎
K. R. Larsen, R. Baskerville, and J. R. Venable, “Theories and Models of Design Science Research Validity,” Information Systems Journal, vol. 35, no. 1, pp. 36-69, 2025. ↩︎
T. D. Cook and D. T. Campbell, Quasi-Experimentation: Design and Analysis Issues for Field Settings. Chicago, IL: Rand McNally, 1979. ↩︎
W. M. K. Trochim and J. P. Donnelly, The Research Methods Knowledge Base, 3rd ed. Mason, OH: Atomic Dog/Cengage Learning, 2008. ↩︎
T. D. Cook and D. T. Campbell, Quasi-Experimentation: Design and Analysis Issues for Field Settings. Chicago, IL: Rand McNally, 1979. ↩︎
W. M. K. Trochim and J. P. Donnelly, The Research Methods Knowledge Base, 3rd ed. Mason, OH: Atomic Dog/Cengage Learning, 2008. ↩︎
T. C. Redman, “The Impact of Poor Data Quality on the Typical Enterprise,” Communications of the ACM, vol. 41, no. 2, pp. 79-82, 1998. ↩︎
R. Y. Wang and D. M. Strong, “Beyond Accuracy: What Data Quality Means to Data Consumers,” Journal of Management Information Systems, vol. 12, no. 4, pp. 5-33, 1996, doi: 10.1080/07421222.1996.11518099. ↩︎
K. J. Arrow, The Limits of Organization. New York: W. W. Norton, 1974. ↩︎
P. Milgrom and J. Roberts, “The Economics of Modern Manufacturing: Technology, Strategy, and Organization,” American Economic Review, vol. 80, no. 3, pp. 511-528, 1990. ↩︎
National Disability Insurance Agency, NDIS Specialist Disability Accommodation Design Standard, Australian Government, 2019. ↩︎
M. D. Myers and J. R. Venable, “A Set of Ethical Principles for Design Science Research in Information Systems,” Information & Management, vol. 51, no. 6, pp. 801-809, 2014, doi: 10.1016/j.im.2014.01.002. ↩︎
S. T. March and G. F. Smith, “Design and natural science research on information technology,” Decision Support Systems, vol. 15, no. 4, pp. 251-266, 1995, doi: 10.1016/0167-9236(94)00041-2. ↩︎