Skip to content
modumatics Modular Infrastructure for Inclusive Housing Tran Thien Toan Ngo · PhD Dissertation

Trigram and Part-of-Speech Structural Metrics

Trigram and part-of-speech structural analysis is documented here for the standards representation corpus in Chapter 5. The analysis was performed on 605 processed clauses to characterise the phrase-level structural patterns that differentiate clause classes and establish the structural regularity needed for schema design. Both aggregate counts and per-class dominant patterns are reported.

Six aggregate metrics are reported for the structural analysis corpus. | Metric | Value | | — | — | | Processed clauses | 605 | | Distinct trigrams extracted | 4,556 | | Retained trigrams (minimum frequency threshold) | 568 | | Dominant applicability-frame trigram | clause applicable to | | Dominant requirement-frame trigram | clause design requirement | | Dominant rationale-frame trigram | clause rationale for |

568 trigrams are retained from 4,556 distinct trigrams (approximately 12.5 per cent). This ratio reflects the high phrase-level variation characteristic of regulatory documents, where clause wording is rarely standardised below the clause-class level. The retained trigrams are recurrent phrase structures that appear across multiple clauses and carry interpretable structural signals. The dominant trigrams listed above identify the lexical anchors through which the three primary clause classes are consistently marked in the corpus. Overall, the trigram profile confirms that clause-class recovery is feasible for the three primary frames. The residual other class lacks a dominant structural marker and cannot be recovered from phrase structure alone. The next section documents the clause-class structural roles that the trigram analysis informs.

Clause-Class Structural Profile

Four clause classes are identified by the structural analysis. Each class carries a distinct structural role in interpretation, and conflating classes produces inferential errors in downstream schema assignment.

Clause Class Structural Role
design_requirement Normative design force; prescriptive statements that the schema must preserve as requirement-bearing
rationale Explanatory support and justification; not direct requirement force and must not be treated as equivalent to design requirements
applicable_to Scope or applicability condition defining the domain within which a requirement applies
other Residual clause class for statements that do not fit the three primary frames; requires explicit annotation

Interpretation

Three interpretive conclusions are drawn from the structural metrics in Chapter 5. First, clause framing performs semantic work that cannot be ignored in representation design. Dominant trigrams demonstrate that structural markers are systematically associated with clause class. Clause class is therefore partially recoverable from phrase structure and must be explicitly assigned rather than assumed uniform. Second, structural ambiguity is pattern-level as well as term-level. The high count of distinct trigrams (4,556) relative to the retained set (568) shows that phrase-level variation is substantial. Ambiguity arises from structural sources as well as lexical ones. Third, clause class is mandatory schema metadata for avoiding inferential conflation. Without explicit class assignment, design requirements, rationale statements, and applicability conditions carry equal weight in downstream processing. Representation errors accumulate through the artefact chain as a result. Taken together, these three conclusions establish that the clause-class annotation layer is a substantive design requirement. It is grounded in measurable structural properties of the corpus. The trigram and part-of-speech evidence presented here provides the structural justification for this field. The standards serialisation schema mandates clause-class as required metadata for every governed row.