Skip to content
modumatics Modular Infrastructure for Inclusive Housing Tran Thien Toan Ngo · PhD Dissertation

1 Purpose and Scope

The artefact suite presented in Chapters 5 through 10 is the v6.0 state of a research programme whose architectural commitments and methodological choices have evolved through prior iterations. This appendix documents two evolutions that materially shape the current artefacts: the Bunnings corpus processing pipeline (the dimensional-analysis methodology of Chapter 8) and the notation conceptual framing (the two-layer architecture of Chapter 7). Both evolutions involved earlier approaches that were investigated, partially executed, and superseded by the approaches now reported in the main text. The earlier approaches are documented here for transparency: the examiner can trace the architectural-thinking trajectory rather than encounter the v6.0 state as if it were the only state ever considered.

The appendix does not relitigate the earlier approaches. Where an earlier approach is shown, the appendix explains what it did, why it was investigated, and what drove the move to the current approach. Cross-references to the current main-text treatments are provided so the reader can navigate between the earlier and current methods without re-reading the canonical chapters. The transparency commitment served by this appendix is consistent with the methodology chapter’s and discussion chapter’s commitments to documenting evidential trajectory honestly, including the approaches that did not produce the contribution but informed the search for the approach that did.

The appendix is organised in two sections. §2 documents the methodology evolution of the Bunnings corpus processing pipeline, with two earlier-approach figures and a cross-reference to the current Chapter 8 §8.21 treatment. §3 documents the notation conceptual evolution from a four-layer nested-onion framing to the current two-layer functor architecture, with one earlier-framing figure and a cross-reference to Chapter 7 §7.3.


2 Methodology Evolution — Bunnings Corpus Processing Pipeline

The Chapter 8 dimensional analysis (§8.21) operates over a 3,592-instance cleaned sample of the Bunnings Australian retail product corpus. The substantive corpus is the same Bunnings catalogue dataset across the methodology evolution documented here: it is the dataset the current composite-coverage sweep evaluates and the dataset the earlier methodology evaluated. What changed is the cleaning logic that produces the cleaned sample and the scoring logic that selects the module candidate from it.

2.1 Earlier Cleaning Approach — Tukey IQR Outlier Fences with k = 8.0

The November 2025 cleaning pipeline applied Tukey IQR outlier fences with a configurable k value (default 8.0) to remove extreme dimension values from the raw catalogue. The pipeline computed Q1, Q3, and the inter-quartile range over the unified pool of dimensions across the four field types (width, length, height, thickness), then marked any dimension outside [Q1 − k·IQR, Q3 + k·IQR] as outranged. Outranged values were set to missing rather than removed entirely, so that a product retaining at least one valid dimension after fence-application was still classified as “cleaned” rather than “no-dimension”. The same fence parameters were applied to all four field types, on the principle that a single coherent fence is more interpretable than per-field fences. An optional hard cap (default none) provided a second exclusion gate above the IQR fence.

DATA - Bunnings Products Dimensions - Dataset Cleaning and Pre-processing.excalidraw.light

Earlier cleaning pipeline (November 2025): Tukey IQR outlier fences with k = 8.0 Six-block workflow specifying the November 2025 Bunnings cleaning pipeline. The Initial Setup block fixes the four cleaning fields (width, length, height, thickness), the zero/negative policy (default reject), the hard cap (default none), and the Tukey k value (default 8.0 IQR). Data Preparation coerces dimensions to number, removes non-positive and non-finite values, and counts valid and missing fields per product. Unified Pool Construction collects all valid dimensions across all four fields without deduplication and excludes any dimension above the hard cap if set. Outlier Fence Calculation computes Q1, Q3, and IQR over the unified pool, sets the lower fence to Q1 − k·IQR and the upper fence to Q3 + k·IQR, and applies the same fence to all four fields. Cleaning Details retains in-fence values, marks out-of-fence values as outranged and sets them to missing, and tracks all outranged dimensions with context. Classification separates the resulting product records into Cleaned (at least one valid dimension) and No-dimension (all four missing). The figure captures the pipeline as exercised on 2025-08-27; the current cleaning approach used by the §8.21 dimensional analysis is documented in Chapter 8 §8.10 (Corpus Construction Method).

The k = 8.0 default reflected a deliberately permissive fence: a typical Tukey IQR analysis uses k = 1.5 for outlier flagging and k = 3.0 for “far out” classification. The choice of k = 8.0 was made to remove only the most extreme catalogue artefacts (data-entry errors, products with implausible dimensions such as a 50-metre door) while retaining the long tails that legitimate residential construction products produce. The wide fence was tractable in the November 2025 corpus but raised two interpretability concerns. First, the choice of k was not derived from a principled criterion; it was set by inspection of the empirical distribution. Second, the same fence applied to four different field types (each with their own dimensional regime) presupposed comparability across fields that the cleaning step did not verify. Both concerns motivated re-examination of the cleaning approach prior to the v6.0 dimensional analysis.

2.2 Earlier Module-Selection Approach — Divisibility Scoring with LCM-Based Meso-Tier Extension

The November 2025 module-selection methodology operated in two tiers. The micro-tier evaluated candidate modules in the 19–296 mm range, ranking each candidate by its divisibility score: the proportion of corpus dimensions that the candidate exactly divides. The procedure required each candidate to be non-divisible by any earlier candidate (preventing trivial multiples from competing as separate candidates) and discarded any candidate whose multiples were already represented by an earlier-ranked candidate. The top five candidates from the micro-tier were then passed to the meso-tier extension. The meso-tier combined the top five into pairs, triples, and quadruples, calculated the least common multiple (LCM) for each combination, removed LCMs exceeding 50,000 mm or duplicated by earlier combinations or overshadowed by simpler ones, and scored the remaining LCMs by the same divisibility criterion. The procedure terminated by ranking the meso-tier results and producing a final ordered list.

EXCALIDRAW - 2025-08-27 09.42.36.excalidraw.light

Earlier module-selection pipeline (November 2025): divisibility scoring with LCM-based meso-tier extension Five-block workflow specifying the November 2025 module-selection methodology. Data Preparation collects all dimension values from the cleaned corpus, excludes values exceeding 10,000 mm, rounds remaining values to the nearest whole number, retains all instances without deduplication, and emits the prepared dataset. Candidate Module Generation iterates over the 19–296 mm integer range, keeps values not divisible by earlier candidates, discards multiples, and records the divisibility link to the earlier candidate. Module Scoring tests each surviving candidate’s divisibility across the full dataset and records both a raw score (count of corpus dimensions divided) and a percentage. Ranking sorts candidates by score and produces an ordered micro-tier list, from which the top five candidates are passed to the next stage. Meso-tier Extension combines the top five into pairs, triples, and quadruples, calculates the LCM for each combination, removes combinations whose LCM exceeds 50,000 mm or that are duplicated or overshadowed by simpler combinations, scores the remaining LCMs using the same divisibility criterion, and produces a ranked meso-tier list. The figure captures the pipeline as exercised on 2025-08-27; the current module-selection approach used by §8.21 is documented in Chapter 8 §8.21 (The Composite-Coverage Sweep) and the subsampling stability test in §8.22.

The divisibility-and-LCM methodology had two structural weaknesses that emerged on closer examination. First, the divisibility criterion alone could not discriminate between a candidate that achieves wide coverage by happenstance (because the corpus contains many multiples of small integers) and a candidate that achieves coverage by co-ordinating with the corpus’s preferred-number structure. Two candidates with similar divisibility scores could carry substantively different co-ordinating utility, and the methodology had no way to surface that difference. Second, the LCM-based meso-tier extension generated combinations whose interpretability was thin: an LCM of 25 mm × 50 mm × 100 mm × 300 mm equals 300 mm and is therefore dominated by the 300 mm candidate alone, and the combinatorial expansion produced many such trivially-dominated combinations. The meso-tier filtering removed the most obvious dominations but did not eliminate the underlying issue that LCM-based combination was the wrong operator for testing multimodule co-ordination.

2.3 Current Approach — Composite-Coverage Sweep with Subsampling Stability

The current Chapter 8 §8.21 methodology evaluates candidate modules using a composite score that integrates three coverage metrics: hard coverage (exact divisibility), soft coverage (within ±12.5% of nearest multiple), and ISO 2848 conformance (whether the candidate is an M/n sub-module of the 100 mm basic module M). The composite score is computed as S(m) = w₁·C(m) + w₂·F(m) + w₃·R(m) with weights w₁ = 1.0, w₂ = 0.5, w₃ = 0.5, evaluated exhaustively over m ∈ {25, 26, …, 296} mm against the full 3,592-instance corpus, with the composite-coverage knee identified as the argmax. Subsampling stability is evaluated by fifty independent stratified resamples at 80% retention (n = 2,874 per resample), each of which re-runs the full composite-coverage sweep. The full method is reported in §8.21, the subsampling stability test in §8.57, and the current cleaning pipeline that produces the 3,592-instance corpus in §8.10.

2.4 What Drove the Evolution

Three drivers moved the methodology from the earlier cleaning and module-selection pipelines (the two earlier-approach figures above) to the current Chapter 8 §8.21 approach. First, the composite-score methodology introduced an explicit weight policy that the divisibility-only score lacked, allowing the dimensional analysis to be reproduced with declared rather than implicit priorities. Second, the composite-coverage knee operationalisation replaced the divisibility-rank operationalisation with a selection criterion that has a literature precedent in the modular product family research and that admits an interpretable visual diagnostic (the composite-coverage knee shown in the 4D composite-coverage curve for module candidate selection). Third, the subsampling stability test replaced informal stability checks with a fifty-resample procedure that records how often the composite-coverage knee recurs across resamples — a plain count over the complete corpus, with no interval estimate or significance test attached. Together, the three changes lifted the dimensional analysis from a single-pass divisibility ranking to a composite-coverage study whose selected module is shown to recur under resampling — the form in which Chapter 8 reports the m* = 25 mm finding and the associated claims CL-8-01, CL-8-02, and CL-8-03.


3 Notation Conceptual Evolution — From Nested Onion to Two-Layer Functor Architecture

The Chapter 7 notation system is presented as a two-layer architecture (RecPol formal core and PlaniSyn applied grammar) connected by a category-theoretic wrapping functor W: G → A (§7.2). An earlier conceptual framing represented the same architectural question as a four-layer nested-onion structure with an informal outer “ontological mapping” layer. The earlier framing did not survive into v6.0 because its outer layer lacked a formal mechanism; the current architecture provides the formal mechanism (the functor plus the extension-point token discipline) that the earlier framing only named.

3.1 Earlier Conceptual Framing (November 2025)

The November 2025 conceptual sketch organised the notation architecture as four concentric layers with an external linkage. The innermost layer was a specific module dimension (labelled unit module 150, corresponding to the 150 mm module candidate from the dimensional analysis’s meso-tier sweep); the next layer outward was RECTANGULAR POLYOMINO (the formal-core syntax now called RecPol); the next was PLANIMATIC SYNTAX (the applied layer now called PlaniSyn); and the outermost was ONTOLOGICAL MAPPING (the bridge to external systems). A bidirectional dashed link connected the outer layer to a separate OTHER SYSTEMS element.

onion_of_systems

Earlier conceptual framing (November 2025): four-layer nested-onion architecture with an “ontological mapping” outer layer November 2025 working conceptualisation of the notation architecture as four nested concentric layers. From innermost to outermost: a specific module dimension (unit module 150, anchored to the 150 mm candidate from the early dimensional analysis); the rectangular-polyomino layer (RECTANGULAR POLYOMINO, what is now called RecPol per Chapter 7); the applied-grammar layer (PLANIMATIC SYNTAX, what is now called PlaniSyn); and a notional outer-bridge layer (ONTOLOGICAL MAPPING). A bidirectional dashed link connects the outer layer to a separate dashed circle labelled OTHER SYSTEMS, representing the conceptual interface between the thesis’s notation and external standards. The figure preserves the working terminology of the November 2025 sketch; current thesis vocabulary differs (see §3.2). The current two-layer functor architecture is documented in Chapter 7 §7.3 and visualised in the two-layer architecture figure.

The earlier framing carried two specific commitments that did not survive the architectural review. First, it conflated the module-dimension question (which belongs to Chapter 8’s empirical analysis) with the notation-architecture question (which belongs to Chapter 7’s formal specification); the unit module 150 innermost layer was an empirical commitment masquerading as a notation-architecture commitment. Second, the ONTOLOGICAL MAPPING outer layer named a bridge function without specifying a mechanism — there was no functor, no extension contract, no preservation property; the outer layer was a label rather than a structure.

3.2 Current Architecture — Two-Layer with Wrapping Functor

The current Chapter 7 architecture separates module dimension from notation architecture (the 25 mm base module and the 150 mm meso-tier candidate are evaluated in Chapter 8 §8.21 and the dimensional library is specified in §8.26; neither appears in the notation architecture diagram) and replaces the informal outer-bridge layer with a category-theoretic wrapping functor W: G → A whose preservation properties are formally specified. The functor preserves identity (W(id_X) = id_{W(X)}), composition (W(g ∘ f) = W(g) ∘ W(f)), and semantic derivability (every PlaniSyn assertion is derivable from the composition of RecPol operations and the functor’s vocabulary binding). Six reserved extension-point tokens (OPEN_PAREN, CLOSE_PAREN, OPEN_BRACKET, CLOSE_BRACKET, DOT, AT) and ten extension contracts (EC-01..EC-10b per EXP-7.6) provide the formal channels through which applied-grammar layers introduce domain-specific syntax without modifying the formal core. The full specification is in Chapter 7 §7.3, with the two-layer notation architecture figure visualising the architecture; the wrapping functor’s formal definition is in the RecPol Specification appendix §12 and the PlaniSyn Grammar appendix §11.

3.3 What Drove the Evolution

Two drivers moved the conceptual framing from the earlier nested-onion sketch to the current two-layer functor architecture. First, the formal-mechanism requirement — every architectural layer must specify what it does rather than merely name what it is — disqualified the informal ONTOLOGICAL MAPPING outer layer and pushed the notation system toward a functor formulation whose preservation properties are stateable in category-theoretic terms (cited Mac Lane 1998 in the two-layer notation architecture figure’s caption). Second, the separation-of-concerns principle, formalised in the modularity theory of Chapter 3 §3.3 (interaction-density asymmetry) and operationalised in Chapter 6’s interface contracts (semantic, transformation, verification), required that empirical commitments (module dimensions) and structural commitments (notation layers) be located in their respective evidence chains rather than collapsed into a single visual artefact. The two-layer functor architecture satisfies both requirements; the four-layer onion satisfied neither.


4 Closing Position

This appendix documents architectural-thinking evolution as part of the thesis’s transparency commitment, not as an apology for revision. The earlier cleaning and module-selection methodology was a defensible exploratory pipeline whose limitations became visible only under the scrutiny that the current Chapter 8 methodology was designed to withstand; the earlier conceptual framing was a defensible working sketch whose informality became visible only when the current Chapter 7 architecture demanded a formal functor. The earlier approaches are documented here in their own terms, with the cross-references that allow the examiner to navigate to the current main-text treatments. The thesis’s contribution is the v6.0 state of the artefact suite; the trajectory from the November 2025 working state to v6.0 is part of the evidence that the contribution was reached through honest engagement with methodological alternatives rather than by selecting the first defensible approach and committing to it without re-examination.