Methodology
4.1 Introduction: From Theory Contract to Method Contract
Theoretical specification without procedural commitment is methodologically inactive. A demanding inheritance burden can fix what an artefact suite must achieve and under what warrant, yet leave wholly open how its claims are produced, how the supporting evidence is assessed, and how the resulting verdicts are adjudicated. That gap is where a design theory either becomes testable or remains a conceptually attractive promise; a proposition that names no procedure for refuting it cannot, on our reading, be said to make an empirical demand at all. Chapter 3 closed precisely at this edge. Five propositions (Proposition 1 to Proposition 5), five meta-requirements (Meta-Requirement 1 to Meta-Requirement 5), and a Gregor–Jones design-theory anatomy established the warrant for the suite, but stopped short of any executable procedure for putting that warrant under load. The work here resolves the gap by specifying the operational bridge between theory and execution, so that each inherited proposition arrives at the artefact chapters already equipped with the measure, baseline, and adjudication rule that would let it fail.
The remit is bounded to that bridge and no further. It states how claims will be tested before Chapters 5 to 9 execute those tests, and how contribution boundaries are preserved for Chapters 10 and 11. The orientation is protocol-facing rather than rhetorical, and the distinction matters: a method contract earns its authority by being specified in advance of the evidence, not by being narrated after it. It does not review design-science literature, which is Chapter 3’s justificatory knowledge; it does not construct artefacts, the task of Chapters 5 to 9; it does not report evaluation results, reserved for Chapter 10; and it does not interpret contributions, the work of Chapter 11. Everything downstream conforms to the contract fixed here. The remainder of this section accepts the inheritance from theory, names the argument spine that organises the commitments, grounds the contract in the problem domain through six environment requirements, and previews the cumulative sequence of the chapter.
Inheritance acceptance. Six methodological elements must be defined before any artefact chapter can proceed.1 Chapter 3 Section 3.6 imposes each as a binding inheritance obligation, and each is accepted here in turn.
- The unit of change. Design features (
DF-05A-01toDF-08D-02) constitute the unit of change. Each specifies a targeted artefact capability that satisfies one or more environment requirements through a declared mechanism object, so that every intervention can be evaluated individually rather than as part of an undifferentiated whole. - The invariant format. Evaluation measures (
EM-4W-01toEM-09-04) supply the invariant format in which criteria are expressed. Each carries a metric definition, a baseline, a unit, a threshold, and its validity risks, so that no measure is admitted without a stated way of failing. - The trigger-to-check mapping. Four design–evaluate cycles (Cycle 1 to Cycle 4) fix which design changes trigger which evaluation checks, binding artefact construction to its corresponding evaluation logic rather than leaving the linkage to be inferred after the fact.
- The baseline. Comparator conditions are declared for each evaluation question and grounded in current-practice workflows, so that every proposition is judged against a defensible standard rather than an arbitrary reference point.
- The evidence objects. Five evidence classes (
EVID-P1-INTERPRETABILITYthroughEVID-P5-BURDEN) name the specific data artefacts that may support or refute each claim, each accompanied by threat annotations and uncertainty boundaries. - The adjudication rules. A four-result scheme — pass, fail, indeterminate, and boundary-limited acceptance — classifies the evidence, with explicit criteria and a dedicated composite rule for Proposition 5.
These six are binding commitments, not editorial preferences. Taken together they convert the inheritance from theory into a closed specification: a discrete unit that can be altered, a fixed format in which success is expressed, a declared mapping from change to check, a defensible standard of comparison, a named set of evidence-bearing artefacts, and a rule for reading those artefacts as verdicts. No subsequent chapter can rescue a missing methodological specification with an attractive artefact or a persuasive narrative; the obligation is discharged here or it remains open.
Argument spine. Four spine codes organise the commitments from problem to contribution. SP-01 (Problem) formalises the environment burden through six environment requirements (ER-01 to ER-06), each a governance obligation derived from the housing-adaptation problem established in Chapters 1 and 2. SP-02 (Artefact) instantiates the mechanism objects — interface, invariant, transformation, and trigger — through design features distributed across the artefact chapters. SP-03 (Evaluation) declares the evaluation logic through five evaluation questions (EQ-01 to EQ-05) with linked measures, pre-registered baselines, thresholds, and adjudication rules. SP-04 (Contribution) extracts reusable prescriptive outputs through contribution rules (TR-01 to TR-04) with explicit scope and falsifier discipline. The spine codes recur throughout the chapter as the connective tissue that keeps each procedural choice traceable to its purpose.
Environment requirements overview. Six environment requirements ground the method contract in the problem domain. Each is a stakeholder-facing governance obligation derived from Chapters 1 and 2, not invented by the method chapter, and each is introduced here at overview level; operationalisation is distributed across the evaluation-linkage and ethical-stakeholder-mapping sections.
ER-01: reduce life-course burden for households and residents through semantically stable representations.ER-02: reduce tacit interpretation load for practitioners and assessors through explicit interfaces.ER-03: provide credible utility evidence for project owners and providers through matched comparisons.ER-04: ensure relation-level interpretability for SDA participants and assessors.ER-05: reduce specialist dependence for design and review teams through transparent trigger-to-check logic.ER-06: prevent exception-driven governance drift for maintenance and governance actors.
Each requirement traces to a specific source in the earlier problem analysis. The derivation trace below maps each requirement to that source so that examiners can locate the originating evidence in Chapter 1 and Chapter 2:
| ER | Governance Obligation | Source Evidence (Chapter 1 / Chapter 2) |
|---|---|---|
| ER-01 | Reduce life-course burden | Chapter 2 Section 2.1: diachronic housing mismatch as dynamic person–environment fit, with residential inertia developed at Section 2.2; Chapter 1 Section 1.2: adaptation cost as barrier |
| ER-02 | Reduce tacit interpretation load | Chapter 2 Section 2.10: representational-governance problem class; synchronous artefact interpretation variance |
| ER-03 | Provide credible utility evidence | Chapter 2 Section 2.11: meta-requirement MR5 (evaluation-readiness); verification burden evidence |
| ER-04 | Ensure relation-level interpretability | Chapter 2 Section 2.10: representational-governance problem class — interpretability across handover and round-trip continuity |
| ER-05 | Reduce specialist dependence | Chapter 2 Section 2.5: interface discipline gap; ungoverned complement coupling |
| ER-06 | Prevent exception-driven governance drift | Chapter 2 Section 2.10: representational-governance problem class — governance drift under ungoverned representational form and unauditable variation |
These requirements are formalised here as the testable obligations that the artefact suite must address and the evaluation programme must assess. They convert a diffuse problem narrative into a finite set of governance targets against which procedural success can later be judged, and each names the stakeholder who bears the burden when the obligation goes unmet — households and residents under ER-01, practitioners and assessors under ER-02 and ER-04, owners and providers under ER-03, design and review teams under ER-05, and maintenance and governance actors under ER-06. Anchoring the contract to those bearers, rather than to the convenience of measurement, is what keeps the evaluation programme answerable to the problem domain it claims to serve.
Section roadmap. The chapter builds cumulatively across seven sections, each adding a layer to the contract before the artefact chapters draw on it; later sections presuppose the constructs that earlier sections define, so the order is load-bearing rather than presentational. Section 4.2 (Research Paradigm and the Inheritance from Theory) justifies design science as the governing paradigm, positions the contribution type and theory-maturity level, and binds Chapter 3’s locked commitments to their operationalisation surfaces. Section 4.3 (Design Cycles, the Data Pipeline, and the Census Basis of Evidence) structures artefact construction through the four cycles, traces provenance through the layered pipeline, and explains why the corpus is treated as a census rather than a sample. Section 4.4 (Evaluation Questions and Measures) declares the five evaluation questions with their linked measures and pre-registered baselines. Section 4.5 (Adjudication, Thresholds, and the P5 Composite Criterion) fixes the four-result scheme and operationalises the composite rule for integrated burden. Section 4.6 (Validity and Reliability) registers threats across the technical, purpose, and instrument dimensions and defines reliability as repeated-execution consistency. Section 4.7 (Ethical Considerations and the Handoff Contract) maps stakeholder groups to ethical obligations and specifies what Chapters 5 to 11 are bound to execute, evaluate, and report.
4.2 Research Paradigm and the Inheritance from Theory
Design Science Research is the governing paradigm here because the thesis object is a designed artefact suite whose evaluation demands construction-and-test logic, not observational explanation or interpretive reconstruction. The research problem set out in Chapters 1 and 2 is interventionist: representational governance failures — semantic drift across handovers, ungoverned interface coupling, non-replayable transformations, and unauditable variation — are resolved by building and evaluating new artefacts, not by observing existing practice.2 Three reasons ground the choice. First, the research object is a designed artefact suite whose properties are specified before construction and judged against pre-declared criteria, satisfying Hevner et al.’s Guidelines 1 through 3 (design as artefact, problem relevance, design evaluation).3 Second, Action Research was considered and rejected: it requires a single organisational setting in which the researcher intervenes in ongoing practice, whereas this system is constructed from regulatory standards text and evaluated through demonstration cases rather than situated intervention.4 Third, interpretive case study was rejected because the artefact does not yet exist as a phenomenon to observe; the demonstration cases are constructed test contexts governed by the evaluation protocol, not naturally occurring settings from which explanatory theory is induced.5 The selection rests on problem–artefact fit.
The commitments adopted here inherit from three works that pre-date the Hevner–Peffers consolidation, and naming them protects the methodology against the critique that DSR is a recent and consequently under-tested paradigm. Walls, Widmeyer and El Sawy formalised the information system design theory — kernel theories, meta-requirements, meta-design, design method, and testable hypotheses — fixing the form a design contribution must take to count as theoretical; this is the structural ancestor of the Gregor–Jones anatomy adopted in Chapter 3 Section 3.5, against which the propositions Proposition 1 through Proposition 5 are framed.6 Nunamaker, Chen and Purdin established the systems-development research methodology — concept, system architecture, analysis and design, then build-observe-evaluate — the precedent under which a DSR programme whose object is a working software artefact gains methodological standing.7 Kuechler and Vaishnavi analysed the anatomy of a DSR project, warranting the treatment of Proposition 1 through Proposition 5 as kernel-theory hypotheses that the design cycles can refine; on this reading the Cycle 4 exception-budget mechanism is a kernel-theory refinement, not methodological drift.8
The thesis follows the six-stage Peffers et al. DSRM, with each stage mapped to chapters so the process stays auditable. Problem identification and motivation occupy Chapters 1 and 2, where the environment requirements ER-01 to ER-06 formalise the problem. Definition of objectives spans Chapters 2 and 3, converting the meta-requirements Meta-Requirement 1 to Meta-Requirement 5 into the propositions Proposition 1 to Proposition 5. Design and development are distributed across the artefact chapters (Chapters 5 to 9), each instantiating design features under explicit mechanism objects and governed by the method contract declared in this chapter. Demonstration and evaluation occur in Chapter 10, which executes the pre-registered evaluation questions EQ-01 to EQ-05 with their declared measures. Communication spans Chapters 11 and 12, which extract contribution rules TR-01 to TR-04 under explicit scope and falsifier discipline. The process is iterative: refinement is documented within each artefact chapter’s design cycles rather than as undocumented inter-chapter revision. Hevner’s three-cycle view supplies a complementary lens — the relevance cycle connects to the problem environment (the ERs), the rigor cycle to the knowledge base (wicked-problem, complex-adaptive-system, modularity, platform, and Gregor–Jones theory), and the design cycle to artefact construction and evaluation.9
Figure 4.4 — Thesis Alignment with Hevner’s Three-Cycle DSR Framework Thesis positioning within Hevner et al.’s (2004) three-cycle design science research framework. The relevance cycle connects the housing-adaptation problem context to the environment requirements; the rigor cycle grounds the design in wicked-problem theory, modularity theory, and platform architecture; and the design cycle iterates between artefact construction (Chapters 5 to 9) and evaluation (Chapter 10), with bidirectional feedback to both.
Two positioning claims are kept on distinct axes and not conflated. In Gregor and Hevner’s contribution-type taxonomy the work is primarily exaptive: modularity theory, relational-graph topology, and text-as-notational-medium are each established in their home domains, and the integrated design theory for this housing-governance application class is the contribution we develop.10 A secondary improvement claim also holds, since the work offers a new representational solution for the known problem of diachronic housing adaptation. At the theory-type level the work is a nascent design theory — Level 2 in Gregor and Hevner’s maturity framework — specifying constructs, design principles, and form–function commitments that can be criticised, operationalised, and tested; progression to a well-developed theory (Level 3, corresponding to Gregor’s Type V) is contingent on the artefact-and-evaluation programme producing positive evidence.11 The exaptive characterisation describes the kind of knowledge contribution; the nascent-to-well-developed trajectory describes how far along the evidential spectrum it has progressed. The artefact suite is classified at the thesis level as a composite of four March-and-Smith artefact types: the standardisation schema is a model with secondary method properties, the Governed Kernel Architecture a model with secondary construct properties, the notation a method with secondary construct properties, and the generator an instantiation with secondary method properties.12 Each per-artefact classification is declared and justified in its chapter. These artefacts are the evidence machinery through which the single architectural claim is discharged, not four independent contributions.
Philosophical stance
The philosophical stance is pragmatist. The artefact suite is judged by its utility in improving representational governance — reducing interpretation variance, bounding verification scope, enabling transformation replay, and governing variation under shared rules — rather than by correspondence to an independently existing representational reality. This orientation is consistent with DSR’s design-oriented epistemology, in which knowledge is produced by creating artefacts that demonstrate, or fail to demonstrate, desired properties under specified conditions.
Pragmatism does not entail relativism about evidence. The evaluation protocol declared in Section 4.5 pre-commits to baselines, thresholds, and falsifiers that bind regardless of how attractive the design may appear, so the artefact can fail the evaluation honestly. A brief articulation of the stance is itself warranted: empirical analyses of DSR doctoral work report that a substantial share of theses leave the ontological or epistemological basis for judging artefact merit unstated, and this section closes that gap.
DSR’s own limitations are acknowledged rather than assumed away. Iivari identifies a transfer-boundary tension — design principles may not transfer without adaptation, and the methodology offers no systematic way to predict where transfer fails; here the artefact is designed for Australian SDA-governed housing, so transfer to other regulatory or cultural contexts is a boundary condition, not a guarantee.13 Venable raises the related concern that constructed-scenario evaluation provides weaker support for prescriptive claims than naturalistic field evidence.14 The study answers this by classifying its evaluation as artificial-summative under the FEDS Technical-Risk-and-Efficacy strategy: the declared evaluation ceiling is efficacy under constructed demonstration, not effectiveness under naturalistic field deployment, with naturalistic evaluation identified as future work. Contribution claims are bounded accordingly. The boundary conditions follow from this posture: the theory applies where adaptation recurs, where representational handover materially affects decisions, and where verification burden is a real constraint, while universal policy optimisation, one-shot bespoke settings, and contexts in which handover quality is non-critical lie out of scope.
Action Design Research acknowledgement
This study is not executed as a full Action Design Research programme. ADR requires embedded organisational intervention with concurrent theory development and practice shaping, whereas the present work constructs artefacts from regulatory text and evaluates them through demonstration cases rather than organisational deployment.15
The study nonetheless retains three ADR-informed disciplines for auditability. Intervention records document each design cycle’s change and its rationale; reflection records document what each evaluation learned and what was adjusted; and iteration records document each artefact chapter’s design-cycle history, including negative outcomes and abandoned alternatives. These records support audit, not claims about organisational practice.
All five of Chapter 3’s locked commitments are operationalised through the method contract, each with a declared primary and secondary operationalisation surface plus an adjudication anchor. The mapping is set out in the table below, relocated from the chapter introduction so the binding text sits alongside the paradigm that warrants it; primary operationalisation is where a commitment’s testable form is declared, secondary operationalisation where its evidential or methodological consequences are bound. The adjudication anchors cite the evidence-mechanism identifiers (the EM-4W-* and EM-09-* codes) and the evidence-object identifiers (the EVID-* codes) declared in Section 4.4. No commitment is operationalised solely at the rhetorical level.
| Chapter 3 Section 3.6 Locked Commitment | Primary Operationalisation (this chapter) | Secondary Operationalisation (this chapter) | Adjudication Anchor |
|---|---|---|---|
| (i) Revision-governed adaptation | Section 4.3 cycle structure (Cycle 1 to Cycle 4) — each cycle’s intent–change–evaluation–decision loop treats adaptation as iterative revision rather than terminal optimisation | Section 4.4 EQ-05 (integrated burden) with the Section 4.5 P5 composite criterion — burden is assessed across repeated change episodes rather than at a single static state | Four-result classification (pass / fail / indeterminate / boundary-limited acceptance) applied to repeated cycle outcomes |
| (ii) Modularity as interface-governed coupling control | Section 4.4 design-feature layer — DF-05A-01 to DF-08D-02 declare interface and trigger objects as the unit of change |
Section 4.4 EQ-02 (verification scope) with the Section 4.6 technical-validity layer — bounded-edit verification is the test of interface-governed coupling | EM-4W-02 + EM-09-02 with EVID-P2-CHECK-SCOPE |
| (iii) Platform architecture (stable core + governed instance library) | Section 4.4 EQ-04 (governed variation) and the Cycle 4 specification — variant generation under shared rules with explicit exception budgeting | Section 4.7 ER-06 stakeholder mapping — governance actors absorb the consequence of exception drift if platform discipline fails | EM-4W-03 + EM-09-04 with EVID-P4-VARIATION and EVID-P4-EXCEPTION-BUDGET |
| (iv) Design theory as Gregor–Jones contract | Section 4.2 paradigm justification with Section 4.5 adjudication rules — establishes the design-theory machinery and operationalises the falsifier discipline declared in Chapter 3 Section 3.5 | Section 4.6 validity layers — instrument and purpose validity controls protect the proposition register from instrument bias and case-mismatch threats | Four-result scheme with the P5 composite criterion |
| (v) Trigram-relational-graph-text substrate | Section 4.4 EQ-01 (interpretive divergence) — primary test of P1 where the trigram is the atomic governance unit | Section 4.3 corpus provenance and pipeline architecture, with Section 4.4 EQ-02 and EQ-04 — the topology and serialisation medium are exercised in interface and variation tests | EM-4W-01 + EM-09-01 with EVID-P1-INTERPRETABILITY and EVID-P1-SDA-TRACE |
4.3 Design Cycles, the Data Pipeline, and the Census Basis of Evidence
The paradigm commitments set out in Section 4.2 become an auditable iteration structure of four design-evaluate cycles operating over a provenance-controlled data substrate. Hevner’s three-cycle view structures the design cycle as the iterative process through which artefact construction and evaluation proceed, taking inputs from the relevance cycle (problem requirements) and the rigor cycle (theoretical foundations) and returning tested evidence to both.16 Each cycle here follows a four-step structure: intent (what the cycle aims to achieve and why), change (what artefact construction or modification is undertaken), evaluation (what measures and evidence objects assess the change), and decision (whether the outcome supports continuation, adaptation, or escalation). The structure forces every artefact chapter to leave an auditable record of what was tried, observed, and concluded, including negative or boundary-limited outcomes. It operationalises Peffers et al.’s feedback loop between evaluation and design as a repeatable cycle rather than a single pass.17 We define four cycles, one per artefact, because each artefact discharges a distinct governance function with its own design-change object. A fifth cycle for Proposition 5 (integrated utility) was considered and rejected: Proposition 5’s evidence comes from the interaction of the Cycle 1 to Cycle 4 outputs exercised together in Chapter 10, not from a fifth construction target, and a cycle without a construction target would violate the requirement that each cycle produce a testable artefact change.
Four cycle specifications
The cycle table binds construction to evaluation. It records each cycle’s intent, change, evaluation, and decision logic together with the measures (EM-*) and evidence objects (EVID-*) that tie the cycle to the evaluation protocol in Section 4.4.
| Cycle | Intent | Change | Evaluation | Decision | Linked Measures | Linked Evidence |
|---|---|---|---|---|---|---|
| Cycle 1 | Stabilise standards semantics for cross-actor interpretability | Apply serialisation constraints in the standardisation schema (Chapter 5) | Compare interpretation divergence to baseline | Keep or adapt schema rules | EM-4W-01, EM-09-01 |
EVID-P1-INTERPRETABILITY, EVID-P1-SDA-TRACE |
| Cycle 2 | Bound verification scope for local edits | Publish interface and trigger contracts in the Governed Kernel Architecture (Chapter 6) | Compare local-to-global check-scope ratios | Keep or escalate interface boundaries | EM-4W-02, EM-09-02 |
EVID-P2-CHECK-SCOPE, EVID-P2-LOCAL-GLOBAL |
| Cycle 3 | Ensure transformation replay reliability | Formalise deterministic transformation logic in the notation (Chapter 7) | Run replay and invariant checks | Keep or refine transformation grammar | EM-4W-02, EM-09-02 |
EVID-P3-REPLAY, EVID-P3-INVARIANTS |
| Cycle 4 | Preserve governance under variation | Apply rule-compliant generation with exception budgeting in the generator (Chapter 9) | Assess compliant variation and exception quality | Keep or refine exception policy | EM-4W-03, EM-09-04 |
EVID-P4-VARIATION, EVID-P4-EXCEPTION-BUDGET |
Each cycle addresses one proposition. Cycle 1 tests whether the serialisation schema developed in Chapter 5 materially reduces interpretive divergence relative to the unserialised regulatory text (Proposition 1). Cycle 2 tests whether the declared interfaces and trigger contracts of the modular library in Chapter 6 convert broad reconstruction into local checking plus rule-governed escalation (Proposition 2). Cycle 3 tests whether the notation grammar of Chapter 7 yields deterministic replay with invariants retained across governed transformations (Proposition 3). Cycle 4 tests whether the procedural generation pipeline of Chapter 9 produces legitimate local variation without losing comparability, compatibility, or traceability (Proposition 4).
Decision-criterion specification
The four decision verbs are pre-registered with explicit triggering criteria so that artefact chapters cannot adjudicate cycle outcomes by retrospective narrative. A cycle outcome is classified by what its measures show, calibrated to the four-result scheme in Section 4.4. Keep applies when all linked measures meet their pre-registered thresholds and no falsifier from the Chapter 3 proposition-contract matrix fires: the configuration carries forward unchanged. Adapt applies when at least one measure is boundary-limited and the shortfall localises to a specific design-feature attribute rather than the mechanism object: the affected attribute is modified, not the mechanism, and the boundary is declared. Escalate applies when at least one measure fails with no admissible boundary explanation and the failure implicates the mechanism object, or when a Chapter 3 falsifier fires: the cycle halts and re-enters the design space at the mechanism level. Refine applies when at least one measure is indeterminate, the evidence being insufficient to classify pass or fail (for example, an inadequately specified matched baseline or inconclusive instrument output): the instrument or baseline is sharpened before a verdict is assigned. The criteria are exhaustive at the per-measure level, and the cycle’s overall decision is the strongest verb any measure triggers — escalate dominates adapt, which dominates refine, which dominates keep. The ordering reflects the design-economy principle that mechanism-level failures cannot be discharged by attribute-level repairs and that indeterminate measures must be resolved before they are relied upon.
Inter-cycle dependencies and integrity rules
The cycles form a dependency chain rather than a parallel set. Cycle 1 produces semantically stabilised representations; without stable anchors, Cycle 2 cannot define meaningful module boundaries, because the concepts its interfaces govern would be referentially unstable. Cycle 2 produces declared interface contracts; without these, Cycle 3 cannot formalise a grammar that must operate over declared interfaces rather than unstructured text. Cycle 3 produces a formal notation with deterministic parse and replay; without executable logic, Cycle 4 cannot implement rule-compliant generation. This mirrors the theory-to-artefact derivation in Chapter 3 Section 3.5, in which Proposition 1 enables Proposition 2, which enables Proposition 3, which enables Proposition 4; the ordering is a logical consequence, not arbitrary scheduling. Two integrity rules govern all cycles. First, negative and boundary-limited outcomes must be explicitly recorded — a cycle reporting only positive results is incomplete — which operationalises DSR Principle P6.18 Second, each cycle must reference at least one evaluation measure and one evidence object, so that cycles cannot become narrative accounts of construction without verifiable evaluation. Chapters 6 to 9 are bound to the same discipline: a minimum of three documented iterations each, with at least one negative or boundary-limited outcome.
Figure 4.1 — DSR Cycle Architecture The four design-evaluate cycles (Cycle 1 to Cycle 4) and their inter-dependencies. Each cycle targets one design proposition (Proposition 1 to Proposition 4) and produces one artefact (Chapters 5 to 7, 9). Sequential dependencies flow from semantic stabilisation (Cycle 1) through interface bounding (Cycle 2) and transformation reliability (Cycle 3) to governed variation (Cycle 4); Proposition 5 spans all four cycles as cross-cycle integration.
Design cycle evidence and the Proposition 5 cross-cycle position
The cycle structure has already been exercised in the standardisation-schema programme. Its design-cycle register documents nine explicit iterations (0 through 7b) across four analytical cycles, comprising 39 total experiments that span corpus-integrity baselining, polysemy measurement, schema construction, reproducibility stress testing, adversarial remediation, and integrated assessment (see Chapter 5 Section 5.8). Each iteration records intent, change, evaluation, and decision in the four-step format, including negative outcomes — eight duplicate clause identifiers retained as a design constraint rather than suppressed — and boundary-limited acceptances such as holdout generalisation that demonstrates coverage stability while acknowledging domain-specificity limits. The register is concrete evidence that the discipline is executable rather than aspirational. Proposition 5 (integrated utility under diachronic burden) occupies a cross-cycle position: it is informed by each cycle’s evidence but adjudicated only at the summative level in Chapter 10, where the outputs of Cycles 1 to 4 are exercised together on matched dwelling cases. Its operationalisation — the materiality threshold, composite rule, and learning-effect protocol — is specified in Section 4.5.
The data pipeline
Data collection and transformation are organised as provenance-safe pipelines so that every claim traces from requirement origin through design-feature instantiation to evaluation output. Provenance is a methodological necessity for DSR rigour rather than an administrative convenience: Hevner et al.’s Guideline 5 requires a transparent and repeatable construction and evaluation process,19 and provenance control prevents two specific failures — orphan claims (assertions not linked to a declared requirement) and ghost requirements (requirements no design feature addresses). The architecture has four control layers, each transforming the prior layer’s outputs.
Requirement intake layer. The six environment requirements (ER-01 to ER-06), derived from the problem analysis in Chapters 1 and 2, become coded requirement cards with explicit stakeholder anchors, governance functions, and downstream traceability links. The requirements are testable obligations with declared measures and falsifiers, not aspirational goals.
Design feature layer. The design features (DF-05A-01 to DF-08D-02) are artefact capabilities instantiated across the construction chapters, each satisfying one or more requirements through a declared mechanism object. A no-orphan integrity rule binds every requirement to at least one feature and every feature to at least one requirement; the RDE traceability matrix enforces this across all nine rows.20
Measure layer. The evaluation measures (EM-4W-01 to EM-09-04), defined in Section 4.4, carry pre-registered comparator logic with no post hoc threshold adjustment. They are declared before construction and remain fixed unless a cycle explicitly revises them with documented justification.
Evidence layer. The evidence objects (EVID-P1-INTERPRETABILITY through EVID-P5-BURDEN) are the data artefacts that support or refute proposition claims. Each carries an uncertainty annotation declaring what it can and cannot support, which guards against over-interpretation of partial results.
The data-object transformation table specifies, for each class, its source path, transformation, consumer, and risk control.
| Data Object | Source Path | Transformation Step | Consumer | Risk Control |
|---|---|---|---|---|
| Requirement cards | Problem analysis dossier and register modules (Chapters 1–2) | Requirement coding and normalisation | Section 4.4; Chapter 10 evaluation | Coding rules and audit trail |
| Design feature catalogue | RDE traceability matrix | Feature-to-requirement linking with mechanism-object classification | Chapters 5–9 artefact chapters | No-orphan linkage checks |
| Measure specifications | Chapter 4 evaluation workbench | Threshold and baseline registration with pre-declared comparator logic | Section 4.4; Chapter 10 evaluation | Pre-registered comparator logic; no post hoc adjustment |
| Evidence objects | Case outputs and result logs (Chapters 5–9) | Aggregation and adjudication against declared measures | Chapter 10 evaluation and contribution extraction | Uncertainty annotation; declared evidence-boundary statements |
A binding repeatability principle governs every step: each transformation must be reproducible under declared settings, and a step that cannot be reproduced cannot be cited as evidence. The principle requires the same analytical conclusions from the same inputs and settings, not identical numerical output across all environments. Chapter 5 Section 5.14 demonstrates this through the reproducibility stress test of the standardisation-schema serialisation pipeline. This module owns the pipeline architecture; the section is titled to reflect that the thesis involves genuine computational pipelines — serialisation, parsing, corpus transformation, and rule-compliant generation — rather than conventional data-gathering procedures.
Figure 4.2 — Four-Layer Data Pipeline Architecture The four-layer data pipeline governing provenance across the thesis. The requirement-intake layer (ER-01 to ER-06) feeds the design-feature layer (DF codes), which feeds the measure layer (EM codes), which feeds the evidence layer (EVID objects). Three control principles — no-orphan integrity, repeatability, and provenance traceability — govern all inter-layer transformations.
Corpus provenance and the census basis of evidence
The primary data source for Cycle 1 is the NDIS Specialist Disability Accommodation Design Standard, Version 1 (October 2019), comprising 611 clause records across 25 design categories plus 189 design requirements extracted from 19 normative figures.21 Its extraction method, integrity controls, and scope declarations are owned by Chapter 5 Section 5.3, preserving the anti-overlap boundary between the method chapter and the artefact chapter. Two provenance commitments apply. The corpus is treated as a closed regulatory text, so generalisability to other standards instruments is a declared boundary condition rather than an assumption. The corpus integrity audit — zero missing keys, zero truncation markers, eight flagged duplicate identifiers retained as design constraints — establishes a baseline quality level that downstream transformations must maintain or document departures from.
The second primary data source supports the empirical-substrate design cycle (Cycle 4): a corpus of 745 post-validation single-component Australian residential floor plans, assembled in two construction strata of 572 plans in the October stratum and 173 in the August stratum. The size was chosen on two grounds, both design rationale rather than statistical representativeness: analytical adequacy, the corpus being large enough to expose distributional regularity and stabilise the coupling structure while small enough to preserve per-plan agent-manual rigour; and coverage context, since 745 plans enumerate roughly one in ten thousand of Australia’s national separate-house stock — the 2021 Census records 10,852,208 private dwellings, of which 70.1 per cent are separate houses, approximately 7.6 million — offered as an indication that the corpus is non-trivial in scale and explicitly not as a basis for statistical generalisation to that stock.22 The codification through which the corpus was extracted — the space-category taxonomy, the extraction ruleset, and the coupling-classification method — was itself worked out through staged piloting in the design-science manner: a superseded application-programming-interface extraction iteration that was trialled and pivoted to agent-manual multimodal-vision extraction, a first-batch validation pilot of roughly the first fifty plans that calibrated the ruleset, and then the full corpus processed in monitored waves. It is extracted under the agent-manual canonical protocol whose extraction method, staged development record, and dimensional, occurrence, topological, and configurational analyses are owned by Chapter 8 Section 8.10, again preserving the anti-overlap boundary. The corpus is a complete census, not a probability sample, of the screened stock for the strata it enumerates: every plan that passes the categorisation gate is included, and no sampling frame mediates between the strata and the records analysed. This status is methodologically load-bearing, and its consequence is binding for the whole thesis. Inferential statistics exist to quantify sampling variability — the uncertainty about an unknown population parameter estimated from a sample. Over a census there is no sampling variability to quantify, so a confidence interval, significance test, or resampling distribution computed on it has no referent. Accordingly, no inferential statistic is reported anywhere in this thesis: no p-value, confidence interval, significance test, chi-square, analysis of variance, significance-bearing regression or correlation, and no bootstrap, permutation, or jackknife resampling. This restraint is the more rigorous choice, not a concession: a read-only survey of eight comparable Design Science Research theses in the built-environment, mass-customisation, and BIM literatures found that none reports an author-computed inferential statistic, and all use descriptive evidence.23
Three drafting commitments follow and bind the quantitative reporting in Chapters 5, 6, 8, and 10.
Descriptive by default. Quantitative evidence is reported as counts, proportions, frequencies, coverage, agreement rates, and runtimes, each with explicit denominators.
Thresholds as transparent design decisions. Every numeric threshold is stated as a design decision, with its sensitivity shown by plain counts at the adjacent cut-offs rather than by a resampling distribution.
Validation reserved for functional demonstration. The word “validation” denotes functional demonstration, not statistical confirmation, consistent with the artificial-efficacy evaluation posture set out in Section 4.5.
Where a coupling or adjacency pattern is reported over the corpus, it is reported as a realised-frequency class; absence of a pattern at the census’s co-occurrence floor is reported as such and is not asserted as a prohibition, since a census establishes what the stock realises rather than what it forbids. So constrained, the corpus supplies descriptive ecological grounding and population-level coverage for the artefact’s primitives, and a reference distribution against which the demonstration trajectory of Chapter 10 is positioned descriptively, never a sample from which the artefact’s properties are statistically generalised. Section 4.4 specifies the evaluation strategy that consumes these data objects and applies the pre-registered questions, baselines, thresholds, and adjudication rules.
4.4 Evaluation Questions and Measures
The evaluation is requirement-linked and proposition-linked: its unit is testable correspondence between environment requirements, design features, and declared measures, not narrative plausibility or artefact attractiveness. A single principle governs the design. Every evaluation question must trace to at least one environment requirement, at least one proposition, and at least one pre-declared measure; a question that cannot demonstrate these linkages falls outside the contribution line and does not enter the protocol. The discipline matters because Design Science Research evaluation degenerates into post hoc justification when artefact merits are argued narratively rather than tested against pre-declared criteria.24 Pre-registration — declaring questions, baselines, thresholds, and adjudication rules before results are reported — prevents the thesis from adjusting its success criteria to match its outcomes. The artefact suite can therefore fail honestly, and the protocol is built to make any such failure transparent rather than deniable.
Consolidated requirements-design-evaluation traceability matrix
Table 4.1 maps the requirement, design, and evaluation constructs across Chapters 2 to 5 so their alignment is demonstrable rather than assumed. Each chapter introduces criteria at a different level of abstraction — feasibility criteria, meta-requirements, propositions, evaluation questions, success criteria — and the matrix threads them into a single chain from environment to evidence.
Figure 4.3 — Evaluation Traceability Map Network traceability map linking environment requirements (ER) through propositions (P1 to P5) and evaluation questions (EQ-01 to EQ-05) to evidence objects (EVID). Each evaluation question inherits its success criteria from the propositions and environment requirements it serves, so no evaluation claim is made without a traceable requirement origin.
Table 4.1: Requirements-Design-Evaluation Traceability Matrix
| Ch 2 Feasibility Criterion | Ch 2 Meta-Requirement | Ch 3 Proposition | Ch 4 Evaluation Question | Ch 5 Success Criterion | Evidence Object |
|---|---|---|---|---|---|
| Reversibility | MR1 (Semantic continuity) | P1 (Interpretive convergence) | EQ-01 (Interpretive divergence) | SC-01 + SC-02 | EVID-P1-INTERPRETABILITY |
| Cognitive tractability | MR2 (Bounded composability) | P2 (Interface-bounded verification) | EQ-02 (Verification scope) | SC-04 | EVID-P2-CHECK-SCOPE, EVID-P2-LOCAL-GLOBAL |
| Coordination cost | MR3 (Formal expressibility) | P3 (Replay reliability) | EQ-03 (Round-trip preservation) [primary]; EQ-01 + EQ-02 [secondary] | SC-03 | EVID-P3-REPLAY, EVID-P3-INVARIANTS |
| Institutional compatibility | MR4 (Procedural traceability) | P4 (Governed variation) | EQ-04 (Governed variation) | SC-05 | EVID-P4-VARIATION |
| Temporal sustainability | MR5 (Evaluation-readiness) | P5 (Integrated utility) | EQ-05 (Integrated burden) | — (Chapter 10 scope) | EVID-P5-BURDEN |
| Skill cost | MR1 + MR2 | P1 + P2 | EQ-01 + EQ-02 | SC-01 + SC-02 | EVID-P1-SDA-TRACE |
No feasibility criterion is orphaned, and no success criterion floats without a methodological anchor. The sixth row carries the cross-cutting skill-cost criterion, which maps to a composite of two meta-requirements and two questions rather than a single pair. Skill cost is composited from three observable components: lookup cost (locating the relevant clause or primitive in the serialised schema, addressed by MR1), recall cost (retaining the primitive vocabulary and bounded-composability rules across a verification episode, addressed by MR2), and application cost (applying the recalled primitives to a concrete step, addressed jointly by MR1 and MR2). Each component is rated low, moderate, or high against the matched-baseline equivalent, and the composite is the worst of the three — a conservative rule that prevents a low-lookup, low-recall artefact from masking a high application cost. The composite is reported alongside its three components, so the reader can inspect which dimension drives the classification.
Evaluation questions
Five evaluation questions govern the assessment programme. Each addresses a governance function, links to environment requirements and propositions, and is adjudicated through declared measures and evidence objects. Together they discharge the propositions that test the single architectural claim; they do not pre-empt the engine specified in Chapter 6.
| EQ | Question | Linked ER | Linked Measures | Proposition Anchors | Expected Evidence |
|---|---|---|---|---|---|
| EQ-01 | Does serialised semantic representation reduce interpretive divergence? | ER-01, ER-04 | EM-4W-01, EM-09-01 |
P1 (primary), P3 (secondary) | EVID-P1-INTERPRETABILITY, EVID-P1-SDA-TRACE |
| EQ-02 | Does interface-bounded modularity reduce verification scope? | ER-02, ER-05 | EM-4W-02, EM-09-02 |
P2 | EVID-P2-CHECK-SCOPE, EVID-P2-LOCAL-GLOBAL |
| EQ-03 | Does the encoded artefact preserve invariants under the round-trip cycle? | ER-01, ER-02 | EM-09-01, EM-09-02 |
P3 | EVID-P3-REPLAY, EVID-P3-INVARIANTS |
| EQ-04 | Does platform governance enable legitimate variation under shared rules? | ER-06 | EM-4W-03, EM-09-04 |
P4 | EVID-P4-VARIATION, EVID-P4-EXCEPTION-BUDGET |
| EQ-05 | Does the integrated suite reduce practical burden while maintaining reliability? | ER-03, ER-05 | EM-09-03 |
P5 | EVID-P5-BURDEN |
EQ-01 tests Proposition 1 primarily and Proposition 3 secondarily; its baseline is interpretive divergence under current synchronous-artefact workflows, and its threshold is a material reduction in divergence. EQ-02 is the primary test of Proposition 2: the baseline is monolithic representation, and the threshold is a shift from global to local scope dominance under bounded edits. EQ-03 is pre-registered on its own measures rather than inherited from EQ-01 and EQ-02, so Proposition 3’s evidence chain stands independently; the baseline is narrative or tacit change procedures, and the threshold is deterministic replay, full invariant retention, and parser rejection of invariant-violating inputs under the round-trip cycle. EQ-04 tests Proposition 4 against variation without a stable shared core, with the threshold that every variant sits within the pre-registered exception budget and every exception is typed and justified. EQ-05 is the most demanding question because it tests the integrated system rather than any single component; its operationalisation, including the Proposition 5 composite criterion, is specified in Section 4.5.
Evaluation measure specifications
The measures are pre-registered with metric, unit, baseline, threshold, data source, and validity risks; they are fixed unless a design cycle revises them with documented justification. Within-chapter measures (EM-4W-*) run during formative evaluation inside individual artefact chapters.
| Measure | Metric Definition | Unit | Baseline | Threshold | Data Source | Validity Risks |
|---|---|---|---|---|---|---|
EM-4W-01 |
Interpretation divergence for matched obligations across handovers | Divergence rate | Synchronous-artefact workflow divergence for equivalent obligations | Lower than baseline with practitioner-detectable effect | Within-chapter protocol logs | Coder bias; sampling limits |
EM-4W-02 |
Local-to-global verification scope ratio per bounded change | Ratio | Scope ratio under current monolithic-representation workflow | Local scope dominates under bounded edits | Change-trace logs | Misclassified dependencies |
EM-4W-03 |
Rule-compliant variant generation rate under exception budget | Compliance proportion | Manual exception-heavy variation process | Meets pre-registered exception budget | Generation and exception-record logs | Scenario representativeness |
Summative measures (EM-09-*) run during the Chapter 10 evaluation programme.
| Measure | Metric Definition | Unit | Baseline | Threshold | Data Source | Validity Risks |
|---|---|---|---|---|---|---|
EM-09-01 |
Standards-interpretability ambiguity-reduction across demonstration cases (aggregate resolution rate computed from clause-level trace-completeness inputs) | Aggregate resolution rate | Baseline ambiguity profile under unserialised regulatory text | Aggregate delta ≥ 0.25 relative to baseline (design decision; see below) | Chapter 10 case-output analysis | Instrument definition drift |
EM-09-02 |
Modular-fit and bounded-check performance under change scenarios | Composite score | Baseline modular-fit under monolithic representation | Improved bounded verification relative to baseline | Chapter 10 modular-fit and change-trace outputs | Comparator mismatch |
EM-09-03 |
Workflow burden delta across time, cognitive load, and skill-dependency proxies | Delta (multi-dimensional) | Baseline burden profile for matched tasks | Burden reduction with declared limits; must satisfy the P5 composite criterion | Chapter 10 workflow-comparison outputs | Confounding process changes; learning effect |
EM-09-04 |
Exception governance quality for generated dwelling variants | Quality score | Baseline unguided exception handling without shared rules | All exceptions typed and justified; rate within declared budget | Chapter 10 variation and exception evidence | Subjective coding risk |
These specifications align with the canonical property-question-measure-evidence wiring: Proposition 1 → EQ-01 → EM-4W-01 → EM-09-01; Proposition 2 → EQ-02 → EM-4W-02 → EM-09-02; Proposition 3 → EQ-03 → EM-09-01 + EM-09-02; Proposition 4 → EQ-04 → EM-4W-03 → EM-09-04; Proposition 5 → EQ-05 → EM-09-03. The EM-09-02 indicator — modular-fit and bounded-check performance — is the pre-registered summative measure for P2; any concrete sub-metric coined downstream operationalises it and does not displace it as the canonical name.
The EM-09-01 definition fixes a three-term relationship to forestall metric drift. Trace completeness is the clause-level input — the proportion of obligation-relevant identity, role, deontic, lexical, mapping, and cross-channel anchors traceable from clause to evaluation output. The schema-mediated resolution rate is the derived quantity computed from those inputs against the baseline. Ambiguity-reduction is the framing-level effect that the derived quantity operationalises. This relationship holds for the duration of the programme, and the artefact chapters report against resolution rate as the operative pass/fail surface.
The EM-09-01 threshold of an aggregate resolution rate of at least 0.25 is a transparent design decision, not a statistically derived cutoff. It is anchored to Berry and Kamsties’ synthesis of inspection-based ambiguity-reduction effect sizes, which reports the smallest reduction practitioners can reliably detect through structured inspection.25 Sensitivity is reported by plain counts at adjacent cut-offs rather than by any interval or significance test. The same 0.25 floor binds both Chapter 5 surfaces on which the resolution rate is reported — the chapter-internal computation and the cross-channel reconciliation — so a looser cutoff cannot govern one surface and render the headline finding non-falsifiable.
Proposition baselines are specified at the directional level in this chapter, because the methodology defines the evaluation architecture rather than domain-specific instruments. Each artefact chapter concretises a directional baseline into measured values: Chapter 5 Section 5.11 converts the EQ-01 specification into a six-dimensional empirical baseline. This two-stage choice prevents premature commitment to instruments that the design search may need to adapt.
FEDS efficacy classification and evaluation-pattern mapping
The strategy is classified as Technical Risk and Efficacy within Venable, Pries-Heje and Baskerville’s FEDS framework: the artefact suite is technically complex — formal grammar, graph topology, computational pipelines — so the principal risk is that it does not perform as specified rather than that users reject it, and the evaluation uses constructed demonstration cases that isolate technical performance from adoption dynamics.26 The strategy combines formative episodes within each artefact chapter (EM-4W-*) with summative episodes in Chapter 10 (EM-09-*). The declared evaluation ceiling is therefore efficacy under constructed demonstration, not effectiveness under naturalistic field deployment; naturalistic evaluation is reserved as future work.
Sonnenberg and vom Brocke’s four-pattern taxonomy locates each episode temporally and is descriptive of position only, never a defence of evidential quality.27 EVAL1 (problem identification) sits in Chapters 1 and 2; EVAL2 (solution design) in Chapters 3 and 4. EVAL3 (solution instantiation) covers artefact construction and within-chapter formative assessment; its measurement-protocol boundary admits schema-correctness, internal traceability, primitive coverage, and the EM-4W-* measures, and excludes integrated cross-artefact behaviour and any naturalistic generalisation. EVAL4 (solution in use) is the artificial-summative assessment of the integrated suite against EQ-01 to EQ-05 on two constructed demonstration cases; its boundary excludes naturalistic field deployment, longitudinal adoption, and inferential generalisation beyond the constructed scenario class.
Method choice and the evidence-balance rule
The evaluation combines three method types, each addressing a distinct evidential need. Analytical checks verify rule-level correctness — deterministic replay, invariant retention, no-orphan traceability, coverage completeness — yielding binary pass/fail evidence applied within artefact chapters. Observational traces capture change-propagation evidence: how edits propagate, what checks are triggered, and what scope ratios result. Descriptive interpretation assesses workflow-level outcomes — interpretability, tractable governance overhead, and the burden profile — as qualitative judgements bounded by the matched-scenario context. The evidence-balance rule then requires that no major Chapter 10 claim about real-world applicability rests on artificial evidence alone, and conversely that no formal-property claim rests on interpretive judgement alone.
4.5 Adjudication, Thresholds, and the P5 Composite Criterion
Each evaluation measure carries an explicit baseline and threshold declared before execution, and each result is classified into one of four categories rather than a binary verdict. A measure passes when the threshold is met with supporting evidence from at least one linked EVID-* object. It fails when the threshold is not met and no boundary explanation accounts for the shortfall. It is indeterminate when the evidence is insufficient to classify the result as pass or fail — for example, an inadequately specified matched baseline, inconclusive instrument data, or inadequate coverage of the relevant cases. It is a boundary-limited acceptance when the threshold is partially met under a declared scope constraint that explains the boundary — for example, a measure passing within the tested scenario class but not assessable for untested scenarios. Every major claim in Chapter 10 must cite at least one linked evidence object; claims without evidence linkage are interpretive commentary, not contribution assertions. The four-result scheme is non-standard in design science research, and the departure is deliberate. Binary pass/fail classification suits evidence that is unambiguous and fully controlled. A multi-artefact thesis addressing a complex governance problem instead produces partial, boundary-conditioned, or mixed evidence, and the indeterminate and boundary-limited categories let the evaluation report that honestly rather than force premature classification. These are diagnostic categories, not escape hatches: each signals where evidence falls short and what further work would resolve it.
The chapter-level thresholds are directional rather than numerical. They pre-register the direction of expected effects — reduction, improvement, bounded scope — without committing to a magnitude. The justification is methodological. The thesis evaluates a nascent design theory at what Gregor and Hevner term a Level 2 contribution, applying established theoretical foundations to a first formalisation of accessibility-standard clause governance.28 Prior requirements-ambiguity work establishes that structured inspection can reduce ambiguity to a practitioner-detectable degree, and it is on that benchmark that the instance-level thresholds below are anchored; what it does not supply is a calibrated effect-size baseline for clause-governance schemas of this specific kind. For a first formalisation, directional pre-registration is therefore the appropriate posture: an arbitrary numerical cutoff — say, a required 30% ambiguity reduction — would impose a false precision the evidence base does not support, either too lenient or too strict. What the thesis does pre-register remains substantial: the direction of effect for each evaluation question; the evaluation instruments, with defined metrics, units, baselines, and data objects; the named evidence objects linked to each proposition; and the adjudication logic, comprising the four-result scheme and the Proposition 5 composite criterion. Pre-registration covers the evaluation architecture, not the effect sizes, and it retains analytical teeth when the instruments are precise and the adjudication is honest.
Numerical thresholds as transparent design decisions
Where citation-grounded numerical thresholds can be derived from prior empirical work, three are pre-registered for the instance-level measures in the Chapter 5 standardisation-schema evaluation. They are transparent design decisions anchored to a published benchmark, not statistically derived cutoffs, and their sensitivity is shown by plain counts at adjacent cut-offs rather than by any inferential apparatus.
- Aggregate resolution rate ≥ 0.25, interpretable as a 25-point reduction on the unified ambiguity index. This is the smallest reduction that Berry and Kamsties’ synthesis of inspection-based ambiguity reduction reports as practitioner-detectable through structured inspection rather than aggregation.29 A per-dimension floor of ≥ 0.10 binds the three resolution-mode dimensions (Identity, Structural, Deontic), preventing one dominant dimension from carrying the headline number; the two dual-mode dimensions (Lexical polysemy and Mapping primitives) discharge instead through a governed-transparency criterion, the schema codifying them as auditable, confidence-labelled design objects.
- Holdout vocabulary coverage ≥ 80%, the lower-bound comparable target for domain-restricted vocabulary transferability under bounded-schema conditions.30 Below 80% the primitive set fails to absorb new corpus material without schema revision, which would falsify the transferability claim.
- Polysemy proportion ≤ 0.55 at the headline lexical layer, a position comparable to or below the technical-domain regulatory-text polysemy floor reported in prior requirements-ambiguity work, rather than an arbitrarily tighter cutoff that no published baseline supports.31 Sensitivity is reported descriptively as the counts of clauses falling at the adjacent cut-offs of 0.50, 0.55, and 0.60, so that the reader can see how close the corpus sits to the threshold without any resampling or interval claim.
The ≥ 0.25 aggregate resolution rate threshold is binding across both Chapter 5 measurement surfaces — the chapter-internal per-clause and aggregate computation in Section 5.20 and the cross-channel reconciliation in Section 5.25 — so that no looser cutoff governs the cross-channel measure (XCC-INT-03) and renders the headline finding non-falsifiable. The discrimination this register supports is descriptive, not inferential: the per-clause and aggregate deltas, the null-model contrast, and the reduced-schema layer analysis are read as descriptive comparisons that show where the framework differentiates conditions, consistent with the census-not-sample evidential basis established in Section 4.3.
The P5 composite criterion
Proposition 5 (integrated utility under diachronic burden) requires improvement on a composite criterion — reduced practical burden and maintained or improved assurance reliability — rather than on any single metric, and it is operationalised here for direct application in Chapter 10. The operationalisation discharges the obligation inherited from Chapter 3 Section 3.5, where the falsifiable floor and the learning-effect confound were identified but their procedural resolution deferred.
Materiality threshold. A material improvement is one detectable by a practitioner in a matched-scenario comparison. The standard is framed in practical, not statistical, terms because the demonstration uses constructed demonstration cases for which statistical aggregation would be inappropriate. The evaluator records, for both the artefact-supported and the matched-baseline workflow on the same scenario, three proxies: verification-step count, decision points requiring interpretive judgement, and elapsed workflow time. When the artefact-supported workflow shows a reduction on at least two of the three proxies, visible in the scenario-level data without aggregation, the difference is classified as material. The standard is conservative — it may classify genuinely small improvements as non-material — but conservatism suits a nascent theory’s first empirical claims.
Composite adjudication rule. Both burden and reliability must be non-negative relative to the matched baseline, and at least one must improve. The result is pass (burden-led) when burden improves while reliability holds or improves; pass (reliability-led) when reliability improves while burden holds; fail when either component is purchased at the cost of the other, or when neither improves; and indeterminate when the evidence is insufficient to classify either component. This rule avoids any arbitrary weighting of burden against reliability: it does not assert that a burden gain compensates for a reliability loss, because such trade-off ratios cannot be justified a priori.
Learning-effect separation protocol. Burden reduction after the artefact suite is introduced may partly reflect familiarity with any new workflow rather than the governance mechanism. To distinguish the two, the matched-baseline workflow receives equivalent exposure time. Reduction in the artefact condition but not in the matched baseline supports the mechanism claim; comparable reduction in both flags the learning-effect confound and classifies Proposition 5 as indeterminate for mechanism attribution; a substantially larger reduction in the artefact condition partially supports the mechanism, with the learning-effect contribution explicitly quantified. The protocol does not eliminate the confound — full elimination would require longitudinal controlled work beyond a single doctoral study — but it manages the confound by making it visible and classifiable.
Non-negotiable falsifiable floor. From Chapter 3 Section 3.5: the artefact suite must not increase verification burden for equivalent assurance quality. If governance overhead exceeds the verification savings it generates, the platform claim fails regardless of any other technical merit. This floor tests the thesis’s core value proposition, that explicit representational governance reduces rather than increases the cost of housing-adaptation verification. A positive Proposition 1, Proposition 2, Proposition 3, and Proposition 4 result combined with a failing Proposition 5 would mean the individual artefacts work as designed while the integrated system imposes more overhead than it saves — an honest and informative finding, but not one that would support the prescriptive design-theory claim.
Continuity commitments and temporal comparability
Four downstream continuity obligations ensure that Chapter 10 cannot silently omit evaluation responsibilities; they are binding method commitments, not optional targets. The Proposition 1 interpretability protocol requires Chapter 10 to report interpretive-divergence evidence using the EQ-01 protocol. The Proposition 2 change-scope metric requires it to report local-to-global verification scope ratios using the EQ-02 metric. The Proposition 4 variation-validity protocol requires exception-governance evidence using the EQ-04 protocol. The Proposition 5 comparative burden design requires burden and reliability evidence using the composite design above. If Chapter 10 does not address a declared obligation, the omission must be explicitly justified as a scope limitation rather than left unmentioned.
Two temporal-comparability specifications apply beyond the Proposition 5 matched baseline, discharging the requirement in Chapter 3 Section 3.6 to define a comparable starting state, intervention scope, and verification burden across change episodes. For Proposition 2 (EQ-02), the demonstration must compare verification scope before and after a bounded change that modifies elements within a single module without altering its interface contracts; where starting states differ materially in complexity, scope ratios are normalised against the declared module count. For Proposition 4 (EQ-04), the demonstration must compare variation governance with and without the platform architecture, assessing whether a requested legitimate variant is produced within the exception budget; where variation demand differs materially between scenarios, exception rates are reported relative to the declared variation space. Section 4.6 specifies the validity and reliability controls that protect these measures and evidence objects from the principal threats to their credibility.
4.6 Validity and Reliability
Validity and reliability controls protect the Section 4.4 measures and evidence objects from the principal threats to their credibility. Section 4.4 specified five evaluation questions with pre-registered measures, baselines, and adjudication rules; the question now turns from what the evaluation tests to whether the testing instruments are defensible and whether the conclusions drawn from them are warranted given the conditions under which evidence was collected. We organise threats across three layers — technical validity (does the artefact behave as specified?), purpose validity (does it improve intended outcomes in context?), and instrument validity (are the measures defensible with their limits made explicit?). This three-layer structure is triangulated across three traditions rather than dependent on one. The DSR-specific literature contributes Larsen, Baskerville and Venable’s five-type framework, which distinguishes design, internal, construct, external, and statistical-conclusion validity for IS artefacts.32 The quasi-experimental tradition contributes Cook and Campbell’s canonical four-type taxonomy, which remains the reference for threats in field-setting evaluation.33 The general research-methods tradition contributes Trochim and Donnelly’s scaffolding for validity reasoning across qualitative and quantitative designs.34 Convergence across three independent traditions guards against two risks that a recent-only citation would carry: the novelty risk that the Larsen et al. taxonomy has not yet accumulated independent confirmation, and the translation risk that DSR-specific labels may not align cleanly with the canonical labels examiners use.
The three-layer model collapses Larsen et al.’s five-type decomposition, absorbing design and external validity into purpose validity and statistical-conclusion validity into instrument validity. The collapse is justified, not convenient: in this thesis’s evaluation envelope no evidential conclusion in Chapter 10 turns on which merged sub-type is in play. Consider the strongest test case, the EQ-05 burden-reduction claim evaluated on a constructed demonstration case. A design-validity, external-validity, or statistical-conclusion failure all resolve through the same instruments — the matched-baseline protocol, the pre-registered measure threshold with its practitioner-detectable materiality criterion, and the four-result adjudication scheme — and all produce the same mitigation. Studies that rely on inferential statistics with effect-size claims over a probability sample would warrant the full five-type decomposition; this thesis does not. Its corpus is a census and its demonstration uses constructed cases, so the collapse is a justified scope-fitting choice rather than a concealment of distinctions that would alter outcomes.
Figure 4.5 — Validity Threat Matrix The three-layer validity framework. Technical validity targets parser drift and dependency misclassification through deterministic replay checks; purpose validity targets case mismatch and contextual confounders through matched-baseline protocols; instrument validity targets coding bias and comparator instability through pre-registered thresholds, dual coding, and explicit uncertainty reporting.
Validity-layer controls. The table below links each layer to its measures, principal threats, and mitigation strategy.
| Validity Layer | Core Question | Linked Measures | Principal Threats | Mitigation Strategy |
|---|---|---|---|---|
| Technical validity | Does the artefact behave as specified under declared operations? | EM-4W-02, EM-4W-03, EM-09-02 |
Parser or configuration drift; dependency misclassification | Deterministic replay checks; versioned interface controls |
| Purpose validity | Does the artefact improve intended outcomes in context? | EM-09-01, EM-09-03, EM-09-04 |
Case mismatch; contextual confounders | Matched-baseline protocol; scoped interpretation rules |
| Instrument validity | Are the measures defensible with limits made explicit? | EM-4W-01 to EM-09-04 |
Coding bias; comparator instability; subjective scoring | Pre-registered thresholds; dual coding; uncertainty reporting |
Technical validity guards against computational components diverging from their specifications, mitigated by deterministic replay and versioned interfaces; the standardisation schema’s reproducibility test in Chapter 5 Section 5.14 demonstrates both across Python 3.12 and 3.13. Purpose validity is the layer most relevant to Proposition 5, where the matched-baseline and learning-effect protocols from Section 4.4 are the primary instruments against case mismatch and contextual confounders. Instrument validity guards against systematic measurement bias, mitigated by pre-registered thresholds, dual coding, and uncertainty reporting on every evidence object.
Per-evidence threat register. Each evidence object faces specific threats that require targeted mitigation beyond the layer-level controls. EVID-P1-INTERPRETABILITY (semantic continuity) faces coder disagreement and coverage risk; mitigation is dual coding with inter-coder agreement and coverage reporting relative to the 611-clause corpus. EVID-P2-CHECK-SCOPE (bounded verification) faces hidden dependency escalation; deterministic replay traces change propagation against declared interface boundaries. EVID-P3-REPLAY (transformation replay) faces replay-boundary risk, where outputs deterministic within the tested input space may not hold at uncovered boundary conditions; mitigation is boundary-condition testing with declared input-space coverage. EVID-P4-VARIATION (governed variation) faces representativeness risk; scoped interpretation rules declare the tested and untested variation spaces, and exception budgeting types and justifies every governance exception. EVID-P5-BURDEN (integrated burden) faces a confounding process-change risk — the learning-effect confound, whereby observed burden reduction may reflect workflow novelty rather than the governance mechanism; mitigation is a matched baseline with equivalent exposure time, a pre-registered materiality threshold, and explicit classification of mechanism-caused versus familiarity-caused effects.
Three concrete criteria convert these layer-level controls into binding control surfaces, calibrated to the linked measures and to dual-coding and boundary-coverage conventions in the IS validity literature.3536 They pre-empt the common examiner critique that mitigations are stated generically by specifying numerical floors, a coverage threshold, and one additional per-threat mitigation.
Inter-coder agreement floor (Cohen’s κ). Wherever an evaluation step depends on human classification, we pre-register a Cohen’s κ agreement floor of 0.6 for primary measures and 0.4 for secondary measures. This is a reliability design decision — an inter-coder agreement threshold, a descriptive consistency check between two coders — not an inferential test, and it carries no associated significance claim. The 0.6 floor corresponds to the substantial-agreement band; primary measures are those whose pass/fail status carries a proposition-level claim (EM-09-01, EM-09-02, EM-09-03, EM-09-04). The 0.4 floor corresponds to the moderate-agreement band acceptable for the within-chapter EM-4W-* measures and subordinate dimensions. Below floor, the affected EVID-* object is classified indeterminate under the four-result scheme rather than interpreted under the lower threshold; this triggers a refine decision or a boundary-limited acceptance with the κ value reported. The floors are pre-registered and not adjusted post hoc.
Boundary-coverage threshold (stratum-cell visitation). Wherever an evaluation depends on coverage of a structured corpus, we pre-register an 80% boundary-coverage threshold: at least 80% of the corpus’s stratum cells must be visited — predicate type × deontic force × design category for the SDA standard, module-pair × bounded-edit class for modular-fit, design category × variant axis for variation. This is a coverage threshold, descriptive of which stratum cells the evaluation has visited, not a sampling inference over an unobserved population; the SDA corpus is a census of all 611 clauses, so the criterion governs completeness of visitation rather than representativeness of a draw. Coverage below 80% triggers a declared scope constraint that bounds the claim to the visited stratum cells and lists the unvisited cells. Where strata are not pre-defined — for example, novel exception types surfaced during Cycle 4 — the post-hoc stratification is reported and the threshold re-checked against it.
Per-threat additional mitigations. Each threat class receives one further mitigation beyond the layer controls. Technical-validity threats add a container or environment manifest with an alternate-environment smoke test, exemplified by the Python 3.12/3.13 stress test in Chapter 5 Section 5.14 and binding for Chapters 6 to 8. Purpose-validity threats add a negative-case control — a constructed scenario in which the artefact is not expected to help — so observed effects are distinguished from generic enthusiasm. Instrument-validity threats add publication of the coding rubrics, decision rules, and the full disagreement register as appendix material, operationalising the Cook-and-Campbell convention of reporting unresolved threats rather than concealing them.
The construct of verification debt — the cumulative cost of deferred, avoided, or inadequately executed verification — is grounded in two established traditions so that it functions as an evaluative construct rather than an ad hoc invention. The information-quality literature establishes that data-quality degradation imposes rework, error-propagation, and decision-distortion costs;3738 deferred verification propagates comparable uncertainty through subsequent decisions. The coordination-cost literature establishes that information asymmetry and interface complexity impose transaction costs on multi-actor processes;3940 each handover lacking explicit interface contracts adds overhead the next actor absorbs. This dual grounding, carried forward as a binding commitment from Chapter 9 into Chapter 10, satisfies the construct-validity requirement.
Reliability is defined as repeated-execution consistency at two levels. Computationally, pipelines, parsers, and transformation operations must produce deterministic outputs under declared settings, verified by the replay checks in the technical-validity layer. Analytically, coding and classification must produce consistent results across coders or occasions, verified by dual coding and inter-coder agreement in the instrument-validity layer. A procedure that fails the reliability test is reported, not suppressed, and its evidence is reclassified in adjudication. Single-researcher reflexivity is a declared and bounded limitation — the same person designs, executes, and adjudicates — mitigated by three controls: pre-registration prevents post hoc criterion adjustment; the Section 4.7 ethical reporting rule prevents suppression of negative results; and dual coding on interpretive evidence such as EVID-P1-INTERPRETABILITY provides inter-coder checks independent of the primary researcher. Throughout, the thesis draws no inferential-statistical conclusion over its census and advances no sample-based effect-size claim; the κ and coverage thresholds remain descriptive design decisions, consistent with the census-not-sample governance of Section 4.3.
4.7 Ethical Considerations and the Handoff Contract
Ethical responsibility in this research is operationalised through the environment requirements rather than discharged as a perfunctory compliance section. No human participants, personal data, or organisational case studies requiring ethics-committee approval are involved. The primary data source is the publicly available NDIS Specialist Disability Accommodation Design Standard, a regulatory instrument published by the Australian Government.41 The evaluation programme in Chapter 10 uses constructed demonstration cases built on hypothetical dwelling configurations, not real dwellings or occupants, and conducts no interviews, surveys, or assessments involving identifiable individuals. The ethical obligations that remain are therefore artefact-quality and stakeholder-consequence obligations rather than consent and privacy obligations. The standardisation schema, modular library, notation grammar, and procedural pipeline each claim to improve housing-adaptation governance, and so carry responsibility for the quality of their outputs and the consequences of their adoption. Should those artefacts mislead practitioners, obscure genuine compliance requirements, or create new forms of exclusion, the consequences are ethically significant even where no participant is directly harmed.
Stakeholder consequences across the environment requirements
The stakeholder groups whose interests turn on the artefact suite’s success or failure are identified through the environment requirements, ER-01 to ER-06. The table below maps each requirement to its stakeholder focus, the ethical concern at stake, the risk that arises if the concern is left unmanaged, and the mitigation commitment carried by the research design.
| ER | Stakeholder Focus | Ethical Concern | Risk if Unmanaged | Mitigation Commitment |
|---|---|---|---|---|
| ER-01 | Households and residents | Life-course burden can be misrepresented | Interventions optimise paperwork over adaptability | Burden-sensitive measures and context notes |
| ER-02 | Practitioners and assessors | Tacit interpretation load hides cognitive burden | Unsafe or overly conservative decisions | Explicit invariants and interface publication |
| ER-03 | Project owners and providers | Utility narratives may overclaim causality | Misleading utility claims | Comparator discipline and bounded interpretation |
| ER-04 | SDA participants and assessors | Feature-level pass can mask integrated usability failure | Exclusionary outcomes despite nominal compliance | Relation-level interpretability reporting |
| ER-05 | Design and review teams | Specialist dependence may become structural inequity | Adaptation locked behind scarce expertise | Trigger-to-check transparency and delegation support |
| ER-06 | Governance and maintenance actors | Exception opacity can normalise drift | Unjustified deviation from shared rules | Typed exception budgets with justification |
Three requirements warrant comment beyond the table. The most ethically consequential group is ER-01, the people for whom the housing exists: residents who require accessible dwellings that adapt to changing needs. The risk is that a standardisation schema accelerates administrative compliance while leaving genuine adaptability untouched. The mitigation binds the workflow-burden measure (EM-09-03) to adaptation outcomes, treating documentation efficiency as secondary; if the artefact suite makes compliance faster without improving the adaptability of dwelling outcomes, that finding must be reported as such in Chapter 10 and Chapter 11. ER-04 raises a distinct concern: a schema that achieves feature-level compliance, satisfying each requirement in isolation, can still fail relational integrity, where the configuration of features does not support the intended function. The evaluation addresses this through relation-level interpretability assessment and through the Proposition 5 composite criterion, which requires the integrated system to demonstrate practical utility rather than component-level success alone. ER-06 concerns governance drift: the platform architecture permits legitimate variation through governed exceptions, but exception budgets can normalise drift if justified departures accumulate without review and erode the framework. The mitigation requires every exception to be typed and justified, with governance quality assessed as a summative measure (EM-09-04).
Alignment with the Myers and Venable principles
Six ethical principles for Design Science Research provide the normative frame for the thesis’s posture, and each is either operationalised within the method contract or set aside for cause.42 Assess benefits and harms is operationalised through the stakeholder-ER mapping above: each requirement names a group and the harms that artefact failure could produce, and the evaluation weighs intended benefits against potential harms. Informed consent is not applicable, since the research involves no human participants and draws on public regulatory text and constructed cases. Privacy safeguards are likewise not applicable, as no personal data is collected, processed, or stored. Honesty and integrity are operationalised through the ethical reporting rule set out below and through the pre-registration discipline that forecloses post hoc adjustment of success criteria. Intellectual property raises no conflict: the standard is public, and the analytical methods, schema, and notation grammar are original work. Artefact quality and safety is the most substantively applicable principle here, operationalised through the validity controls in Section 4.6, the falsifier discipline in Section 4.5, and the stakeholder-consequence analysis above; an artefact that claims to govern housing-adaptation standards carries quality and safety obligations that hold even in a research context.
The ethical reporting rule follows directly. Retention of negative results and accepted limitations across the Chapter 10 interpretation and the Chapter 11 synthesis is a binding obligation: when a proposition fails, the failure is reported with its diagnostic implications; indeterminate or boundary-limited evidence is surfaced under that classification rather than suppressed; and any governance overhead the artefact suite introduces unexpectedly is documented. The stakeholders named above are better served by honest failure than by concealed limitation.
The handoff contract
The method commitments established in Sections 4.2 to 4.6 bind the chapters that follow, and the cross-reference below records the resolution of the six methodological elements inherited from Chapter 3 Section 3.6, each now carried by a specific construct.
| Inherited Element | Resolved In | Chapter 4 Construct |
|---|---|---|
| Unit of change | Section 4.4 design-feature layer | Design features DF-05A-01 to DF-08D-02 |
| Invariant format | Section 4.5 measure specifications | Evaluation measures EM-4W-01 to EM-09-04 |
| Trigger-to-check mapping | Section 4.3 cycle table | Cycles 1 to 4 with evaluation steps |
| Baseline | Section 4.5 comparator definitions | Matched-baseline conditions per EQ |
| Evidence objects | Section 4.5 evidence linkage; Section 4.6 threat register | EVID-P1 through EVID-P5 |
| Adjudication rules | Section 4.5 adjudication and P5 operationalisation | Four-result scheme; P5 composite criterion |
Four downstream obligations are fixed. Artefact construction binds Chapters 5 to 9 to instantiate the design features under explicit mechanism objects, to demonstrate that those features satisfy their linked environment requirements, to execute Cycles 1 to 4 with documented intent, change, evaluation, and decision, and to produce the evidence objects specified in the cycle table. Summative evaluation binds Chapter 10 to execute evaluation questions EQ-01 to EQ-05, report measures EM-09-01 to EM-09-04 with linked EVID-* objects, address the four evaluation-continuity commitments, cite at least one evidence object per major claim, and apply the Proposition 5 operationalisation without modification. Contribution extraction binds Chapter 11 to produce the contribution rules TR-01 to TR-04 as scoped prescriptive outputs that remain within the evidence boundaries the evaluation establishes. Bounded transfer binds Chapter 12 to honour the three Chapter 3 boundary clarifications: representational governance does not resolve plural value conflict; modular and platform arrangements are conditional on governance quality; and no unrestricted transfer beyond the declared scope is claimed.
Each contribution rule carries an anticipated output form under the March and Smith taxonomy, specified here so that the contribution claim is not, in Chapter 11, post-hoc rationalised toward the most flattering category.43 TR-01, the semantic-continuity rule for regulatory-text serialisation, is anticipated as a method with secondary construct properties. TR-02, the interface-bounded modularity rule for governance-domain artefacts, is anticipated as a model with secondary construct properties. TR-03, the executable-grammar rule for replayable transformations under invariants, is anticipated as a method with secondary instantiation properties, demonstrated through the RecPol/PlaniSyn instantiation in Chapter 7. TR-04, the platform-governance rule for legitimate variation under shared rules, is anticipated as a model with secondary method properties, paired with an exception-budgeting method. On our reading, these four are not four independent contributions: they are the anticipated forms through which a single architectural claim is discharged, Propositions 1 to 4 warranting claims at the model and method level rather than at the lexical or implementation level. Chapter 11 is bound to honour these forms unless the evidence base forces a downgrade, in which case the downgrade is declared rather than silently presented.
The shared vocabulary anchors carried forward to every subsequent chapter are the argument spine SP-01 to SP-04; the environment requirements ER-01 to ER-06; the evaluation measures EM-4W-01 to EM-09-04; the contribution rules TR-01 to TR-04; the evidence objects EVID-P1-INTERPRETABILITY through EVID-P5-BURDEN; and the adjudication categories of pass, fail, indeterminate, and boundary-limited acceptance. A completion condition keeps the contract binding throughout the evaluation: should Chapter 10 meet a measurement situation the pre-registered protocol does not anticipate, that condition must be declared as a scope limitation rather than resolved by inventing ad hoc criteria. The first artefact chapter to instantiate the contract is Chapter 5, which applies design features DF-05A-01 to DF-05A-03 to address ER-01, ER-03, and ER-04 through the standardisation schema for the SDA Design Standard, accepting the pipeline, cycle structure, evaluation protocol, validity controls, and ethical obligations defined here.
Next chapter: Chapter 5: A Queryable Schema for Accessibility Standards →
Notes
- K. Peffers, T. Tuunanen, M. A. Rothenberger, and S. Chatterjee, “A Design Science Research Methodology for Information Systems Research,” Journal of Management Information Systems, vol. 24, no. 3, pp. 45–77, 2007, doi: 10.2753/MIS0742-1222240302. ↩︎
- A. R. Hevner, S. T. March, J. Park, and S. Ram, “Design Science in Information Systems Research,” MIS Quarterly, vol. 28, no. 1, pp. 75–105, 2004. ↩︎
- K. Peffers, T. Tuunanen, M. A. Rothenberger, and S. Chatterjee, “A Design Science Research Methodology for Information Systems Research,” Journal of Management Information Systems, vol. 24, no. 3, pp. 45–77, 2007, doi: 10.2753/MIS0742-1222240302. ↩︎
- R. L. Baskerville, “Investigating Information Systems with Action Research,” Communications of the Association for Information Systems, vol. 2, no. 1, Article 19, 1999. ↩︎
- R. K. Yin, Case Study Research and Applications: Design and Methods, 6th ed. Thousand Oaks, CA: Sage, 2018. ↩︎
- J. G. Walls, G. R. Widmeyer, and O. A. El Sawy, “Building an Information System Design Theory for Vigilant EIS,” Information Systems Research, vol. 3, no. 1, pp. 36–59, 1992, doi: 10.1287/isre.3.1.36. ↩︎
- J. F. Nunamaker, M. Chen, and T. D. M. Purdin, “Systems Development in Information Systems Research,” Journal of Management Information Systems, vol. 7, no. 3, pp. 89–106, 1991, doi: 10.1080/07421222.1990.11517898. ↩︎
- W. Kuechler and V. Vaishnavi, “On Theory Development in Design Science Research: Anatomy of a Research Project,” European Journal of Information Systems, vol. 17, no. 5, pp. 489–504, 2008, doi: 10.1057/ejis.2008.40. ↩︎
- A. R. Hevner, “A Three Cycle View of Design Science Research,” Scandinavian Journal of Information Systems, vol. 19, no. 2, pp. 87–92, 2007. ↩︎
- S. Gregor and A. R. Hevner, “Positioning and Presenting Design Science Research for Maximum Impact,” MIS Quarterly, vol. 37, no. 2, pp. 337–355, 2013, doi: 10.25300/MISQ/2013/37.2.01. ↩︎
- S. Gregor, “The Nature of Theory in Information Systems,” MIS Quarterly, vol. 30, no. 3, pp. 611–642, 2006, doi: 10.2307/25148742. ↩︎
- S. T. March and G. F. Smith, “Design and natural science research on information technology,” Decision Support Systems, vol. 15, no. 4, pp. 251–266, 1995, doi: 10.1016/0167-9236(94)00041-2. ↩︎
- J. Iivari, “Distinguishing and Contrasting Two Strategies for Design Science Research,” European Journal of Information Systems, vol. 24, no. 5, pp. 471–478, 2015, doi: 10.1057/ejis.2013.35. ↩︎
- J. R. Venable, “The Role of Theory and Theorising in Design Science Research,” in Proc. DESRIST 2006, pp. 1–18, 2006. ↩︎
- M. K. Sein, O. Henfridsson, S. Purao, M. Rossi, and R. Lindgren, “Action Design Research,” MIS Quarterly, vol. 35, no. 1, pp. 37–56, 2011. ↩︎
- A. R. Hevner, “A Three Cycle View of Design Science Research,” Scandinavian Journal of Information Systems, vol. 19, no. 2, pp. 87–92, 2007. ↩︎
- K. Peffers, T. Tuunanen, M. A. Rothenberger, and S. Chatterjee, “A Design Science Research Methodology for Information Systems Research,” Journal of Management Information Systems, vol. 24, no. 3, pp. 45–77, 2007, doi: 10.2753/MIS0742-1222240302. ↩︎
- A. R. Hevner, S. T. March, J. Park, and S. Ram, “Design Science in Information Systems Research,” MIS Quarterly, vol. 28, no. 1, pp. 75–105, 2004. ↩︎
- A. R. Hevner, S. T. March, J. Park, and S. Ram, “Design Science in Information Systems Research,” MIS Quarterly, vol. 28, no. 1, pp. 75–105, 2004. ↩︎
- See Appendix: RDE Traceability Matrix. ↩︎
- National Disability Insurance Agency, NDIS Specialist Disability Accommodation Design Standard, Australian Government, 2019. ↩︎
- Australian Bureau of Statistics, Census of Population and Housing 2021: Housing, Australian Government, 2022. ↩︎
- The benchmark survey of comparable Design Science Research theses finds descriptive-only quantitative reporting; the closest descriptive-only peers are Hjelseth (2015), Solihin (2016), and Khudhair (2022). ↩︎
- A. R. Hevner, S. T. March, J. Park, and S. Ram, “Design Science in Information Systems Research,” MIS Quarterly, vol. 28, no. 1, pp. 75–105, 2004. ↩︎
- D. M. Berry and E. Kamsties, “Ambiguity in requirements specification,” in Perspectives on Software Requirements, J. C. S. P. Leite and J. H. Doorn, Eds. Boston, MA: Springer, 2004, pp. 7–44, doi: 10.1007/978-1-4615-0465-8_2. ↩︎
- J. R. Venable, J. Pries-Heje, and R. Baskerville, “FEDS: A Framework for Evaluation in Design Science Research,” European Journal of Information Systems, vol. 25, no. 1, pp. 77–89, 2016, doi: 10.1057/ejis.2014.36. ↩︎
- C. Sonnenberg and J. vom Brocke, “Evaluations in the Science of the Artificial – Reconsidering the Build-Evaluate Pattern in Design Science Research,” in DESRIST 2012, Lecture Notes in Computer Science, vol. 7286, Springer, 2012, pp. 381–397. ↩︎
- S. Gregor and A. R. Hevner, “Positioning and presenting design science research for maximum impact,” MIS Quarterly, vol. 37, no. 2, pp. 337–355, 2013. ↩︎
- D. M. Berry and E. Kamsties, “Ambiguity in requirements specification,” in Perspectives on Software Requirements, J. C. S. P. Leite and J. H. Doorn, Eds. Boston, MA: Springer, 2004, pp. 7–44, doi: 10.1007/978-1-4615-0465-8_2. ↩︎
- E. Kamsties and D. M. Berry, “Detecting ambiguities in requirements documents using inspections,” in Workshop on Inspection in Software Engineering (WISE 2001), 2001. ↩︎
- D. M. Berry and E. Kamsties, “Ambiguity in requirements specification,” in Perspectives on Software Requirements, J. C. S. P. Leite and J. H. Doorn, Eds. Boston, MA: Springer, 2004, pp. 7–44, doi: 10.1007/978-1-4615-0465-8_2. ↩︎
- K. R. Larsen, R. Baskerville, and J. R. Venable, “Theories and Models of Design Science Research Validity,” Information Systems Journal, vol. 35, no. 1, pp. 36–69, 2025. ↩︎
- T. D. Cook and D. T. Campbell, Quasi-Experimentation: Design and Analysis Issues for Field Settings. Chicago, IL: Rand McNally, 1979. ↩︎
- W. M. K. Trochim and J. P. Donnelly, The Research Methods Knowledge Base, 3rd ed. Mason, OH: Atomic Dog/Cengage Learning, 2008. ↩︎
- T. D. Cook and D. T. Campbell, Quasi-Experimentation: Design and Analysis Issues for Field Settings. Chicago, IL: Rand McNally, 1979. ↩︎
- W. M. K. Trochim and J. P. Donnelly, The Research Methods Knowledge Base, 3rd ed. Mason, OH: Atomic Dog/Cengage Learning, 2008. ↩︎
- T. C. Redman, “The Impact of Poor Data Quality on the Typical Enterprise,” Communications of the ACM, vol. 41, no. 2, pp. 79–82, 1998. ↩︎
- R. Y. Wang and D. M. Strong, “Beyond Accuracy: What Data Quality Means to Data Consumers,” Journal of Management Information Systems, vol. 12, no. 4, pp. 5–33, 1996, doi: 10.1080/07421222.1996.11518099. ↩︎
- K. J. Arrow, The Limits of Organization. New York: W. W. Norton, 1974. ↩︎
- P. Milgrom and J. Roberts, “The Economics of Modern Manufacturing: Technology, Strategy, and Organization,” American Economic Review, vol. 80, no. 3, pp. 511–528, 1990. ↩︎
- National Disability Insurance Agency, NDIS Specialist Disability Accommodation Design Standard, Australian Government, 2019. ↩︎
- M. D. Myers and J. R. Venable, “A Set of Ethical Principles for Design Science Research in Information Systems,” Information & Management, vol. 51, no. 6, pp. 801–809, 2014, doi: 10.1016/j.im.2014.01.002. ↩︎
- S. T. March and G. F. Smith, “Design and natural science research on information technology,” Decision Support Systems, vol. 15, no. 4, pp. 251–266, 1995, doi: 10.1016/0167-9236(94)00041-2. ↩︎