Ghazouani, M. (2026) “Beyond Statistical Correlation Introducing Ontological Attention For Culturally-Grounded AI,” The Ilantic Journal .
DOI : https://doi.org/10.5281/zenodo.18477392
The meteoric rise of transformer-based artificial intelligence systems has produced capabilities that would have seemed fantastical merely a decade ago. Models generate photorealistic images from text descriptions, engage in sophisticated dialogue, and produce creative content across modalities with remarkable fluency. Yet beneath this technical triumph lies a persistent conceptual limitation that becomes apparent not when examining what these systems can do, but when scrutinizing what they fundamentally understand. A recent position paper by Momen Ghazouani articulates this limitation through what he terms the "Ontological Gap" the systematic absence in current AI architectures of mechanisms that encode the historical genesis, cultural embeddedness, and existential purpose of concepts beyond their statistical correlations in training data.
The paper introduces "Teleo-Transformers," a proposed architectural framework that would augment traditional attention mechanisms with what Ghazouani calls "Causal Embeddings." These embeddings would link vocabulary and visual concepts to their etymological roots, historical contexts, and what Aristotle termed their "final cause" their purpose or reason for being. The framework is explicitly presented as conceptual rather than empirical, establishing theoretical foundations rather than reporting experimental validation. This positioning invites examination of whether the ontological gap represents a genuine architectural limitation or whether it conflates distinct problems that might be addressed through alternative approaches.
The core argument rests on a philosophical distinction between correlation and constitution. Current systems learn that certain elements appear together in training data Ottoman domes frequently co-occur with the word "mosque" in image-caption pairs, and "resistance" appears in textual proximity to "conflict" and "violence." Ghazouani argues that these distributional patterns, however statistically accurate, fail to capture constitutive relationships the features that make a concept what it is rather than merely what it appears near. An Andalusian mosque, in this view, is not defined by the statistical aggregation of Islamic architectural elements but by specific historical and cultural synthesis: horseshoe arches inherited from Visigothic architecture, red-and-white voussoirs reflecting Roman influences, and decorative elements specific to Umayyad builders working in medieval Iberia. Similarly, "resistance" derives ontological meaning not from its proximity to violence in news corpora, but from its etymological root in "standing upright" and its deployment in contexts asserting human dignity against forces of negation.
This distinction raises immediate questions about what precisely constitutes ontological knowledge and how it differs operationally from sophisticated distributional learning. The paper draws heavily on phenomenological philosophy particularly Heidegger's concept of language as the "house of Being" and Gadamer's notion of "effective history" to argue that meaning emerges from historical sedimentation rather than synchronic correlation. Words carry their histories with them; architectural forms embody cultural syntheses that occurred in specific times and places. Yet translating these philosophical insights into computational mechanisms requires specifying what information ontological embeddings would contain that distributional embeddings systematically lack.
The proposal envisions constructing Causal Embeddings from multiple heterogeneous sources: etymological databases tracing words to historical roots, historical corpora revealing how concepts were articulated in existentially significant contexts, cultural knowledge bases encoding constitutive rather than correlative relationships, and philosophical literature providing theoretical frameworks for understanding concepts' purposes and grounds. These embeddings would operate alongside traditional distributional representations, with attention mechanisms computing both distributional relevance (which tokens co-occur frequently) and ontological relevance (which tokens share etymological roots, historical contexts, or existential purposes). The model would combine these dual attention maps to produce representations that are both distributionally informed and ontologically grounded.
The technical feasibility of this approach merits careful consideration. Constructing comprehensive ontological knowledge bases represents an undertaking of considerable magnitude, requiring systematic extraction and formalization of knowledge that exists primarily in scholarly literature, cultural documentation, and community practice. While resources like WordNet and ConceptNet provide some infrastructure for semantic relationships, they focus predominantly on synchronic connections rather than diachronic ontological grounds. Etymology databases exist for many languages, but linking etymological information to computational embeddings that transformers can attend to during generation poses non-trivial engineering challenges. More fundamentally, the proposal assumes that ontological knowledge can be specified with sufficient precision and completeness to operationally constrain generation in ways that distributional learning cannot replicate.
This assumption warrants scrutiny. Consider the Andalusian mosque example central to the paper's argument. Ghazouani claims that horseshoe arches are constitutive of Andalusian architectural identity while Ottoman domes are ontologically incompatible. This reflects genuine architectural historical knowledge horseshoe arches do characterize Umayyad architecture in Iberia, while Ottoman influence did not extend to medieval Islamic Spain. But how would this knowledge be formalized? One might encode rules specifying required and forbidden features for architectural categories, but this risks brittle symbolic approaches that proved limiting in earlier AI paradigms. Alternatively, one might represent ontological knowledge through additional embedding dimensions capturing historical period, geographic location, and cultural synthesis, but it remains unclear how these dimensions would mechanistically prevent the combination of Ottoman and Andalusian features beyond providing additional signals that the model might or might not learn to use appropriately.
The paper acknowledges this challenge, noting that ontological embeddings would likely require semi-supervised learning where human experts evaluate whether generated content respects ontological constraints, with feedback used to refine embeddings. This introduces a practical circularity: we need ontological embeddings to generate ontologically coherent outputs, but we need to generate outputs to learn what constitutes ontological coherence. The proposal that models would learn when to weight ontological versus distributional attention based on context essentially delegates the core problem distinguishing constitutive from correlative features back to the learning process rather than solving it through architectural innovation.
A deeper question concerns whether the ontological gap as formulated represents a single coherent phenomenon or conflates multiple distinct limitations. The paper presents failures across several domains: cultural incoherence in image generation (Andalusian mosques with Ottoman domes), semantic stereotyping (justice always depicted as Lady Justice), temporal inconsistency in video (disappearing shoes during walking sequences), and cultural insensitivity (Egyptian weddings depicted with Moroccan or Saudi elements). These failures share a family resemblance all involve generating outputs that are distributionally plausible but somehow "wrong" yet they may have different underlying causes requiring different solutions.
Temporal inconsistency in video, for instance, might be better addressed through explicit physical and causal modeling rather than semantic ontology. The problem that shoes disappear during walking sequences arguably stems from treating frame generation as semi-independent tasks constrained primarily by visual similarity rather than from lacking access to the ontological meaning of "walking." Current video models already struggle with physical consistency violating conservation of mass, spontaneously generating or destroying objects, failing to maintain object permanence in ways that seem related to architectural limitations in temporal modeling rather than to insufficient understanding of what activities mean. Augmenting these models with knowledge that "walking requires persistent feet" addresses the symptom but perhaps not the cause, which may be that frame-by-frame generation with limited temporal context cannot easily maintain long-term physical consistency regardless of semantic understanding.
Similarly, the conflation of architectural styles from different Islamic traditions when generating mosques might reflect dataset composition and label granularity rather than fundamental architectural incapacity. If training data contains images tagged generically as "mosque" or "Islamic architecture" without fine-grained regional and temporal specificity, models learn correspondingly coarse associations. Improving label granularity ensuring training data distinguishes Andalusian, Ottoman, Persian, and Moroccan architectural traditions might substantially reduce style mixing without requiring ontological embeddings. This raises the question of whether the ontological gap is an architectural limitation that no amount of better data labeling could address, or whether it partially reflects data quality issues that are technically distinct from the philosophical concerns the paper foregrounds.
The paper anticipates this objection, arguing that even with perfect data diversity, purely distributional approaches cannot distinguish essential from accidental features. But this claim requires more careful examination. Modern machine learning has shown remarkable capacity to discover subtle patterns and hierarchical representations through optimization on appropriate objectives. If training included carefully curated examples where human experts had labeled features as constitutive versus merely correlative for specific concepts, models might learn to make similar distinctions without explicit ontological embeddings. The question becomes empirical: can distributional learning with sufficiently rich supervision discover ontological relationships, or does this require fundamentally different architectural mechanisms?
The proposal's emphasis on cultural sensitivity and authenticity raises important ethical considerations. Ghazouani argues that current models trained on internet-scale datasets disproportionately represent certain cultural perspectives, and that ontological grounding could address this by making cultural knowledge explicit rather than leaving it implicit in distributional patterns. The vision is that different cultural communities could contribute ontological knowledge about their own practices, which models could access when generating culturally specific content. This participatory approach has genuine appeal as a response to concerns about AI systems imposing dominant cultural perspectives or erasing cultural specificity through homogenization.
However, the framework faces a tension between respecting cultural specificity and avoiding cultural essentialism. Encoding that Egyptian weddings have certain constitutive features risks freezing living practices into static definitions that deny cultural dynamism and variation. Cultures evolve; contemporary practitioners may creatively adapt or reject historical practices; individuals within cultures may contest what counts as "authentic" representation. The paper acknowledges this challenge, suggesting that ontological embeddings should represent multiple cultural perspectives and allow contextual selection rather than imposing monolithic definitions. But implementing this pluralistic approach while maintaining operational utility remains underspecified. If ontological embeddings encode multiple potentially conflicting perspectives on what makes Egyptian weddings Egyptian, how does the model decide which perspective to deploy in any particular generation task?
Furthermore, questions of epistemic authority become acute when formalizing cultural knowledge. The paper proposes that cultural communities should have authority over how their concepts and practices are represented, treating cultural knowledge as something communities possess and should govern. This principle seems ethically sound, but implementation faces practical complications. Cultural communities are not monolithic; they contain internal diversity, disagreement, and power asymmetries. Who speaks for a culture when specifying its ontological knowledge? How are decisions made when community members disagree about authentic representation? These questions have no purely technical answers they require careful attention to cultural politics and power relations that extend well beyond architecture design.
The evaluation challenges identified in the paper are substantial and perhaps underappreciated. Ghazouani proposes a tiered evaluation approach incorporating automated consistency checks, expert human evaluation, community-based evaluation, and comparative studies. This multi-tiered strategy appropriately recognizes that ontological coherence cannot be assessed through existing automated metrics like FID score or CLIP similarity. But human evaluation, particularly expert and community evaluation, introduces practical barriers that may limit research velocity. If each model iteration requires extensive human review by domain experts and cultural community members, the feedback cycles necessary for iterative improvement become prohibitively slow and expensive compared to current practices where researchers can rapidly test variants using automated benchmarks.
This raises a broader question about what kind of AI research the ontological approach would enable or constrain. Current paradigms favor rapid experimentation, large-scale training, and optimization on metrics that can be computed automatically. The Teleo-Transformer framework, if implemented as proposed, would require slower, more deliberative research processes involving sustained interdisciplinary collaboration and participatory evaluation. Whether the research community would adopt such approaches depends partly on whether the benefits culturally grounded, ontologically coherent outputs—prove sufficiently valuable to justify the additional complexity and cost.
The paper's philosophical grounding in phenomenology and hermeneutics provides conceptual richness but also introduces assumptions that warrant examination. The framework presumes that concepts have identifiable ontological grounds that can be recovered through etymological and historical analysis. This historicist view draws on traditions emphasizing how meaning is constituted through historical practice and cultural inheritance. Yet alternative philosophical perspectives might challenge these foundations. Nominalist positions might argue that meanings are ultimately conventional and that seeking deep ontological grounds reifies what are fundamentally arbitrary social agreements. Poststructuralist perspectives might emphasize the instability and contestability of meaning in ways that resist formalization into stable ontological embeddings.
The paper's response would likely be that regardless of philosophical debates about the ultimate nature of meaning, there exist practically important distinctions between representations that respect cultural and historical specificity and those that do not. An image of an Andalusian mosque that contains only Ottoman architectural elements fails on practical grounds even if one remains agnostic about ontological realism. This pragmatic defense has merit, but it shifts the justification from philosophical foundations to instrumental utility, which may be more appropriate for an engineering proposal than the paper's phenomenological framing suggests.
One underexplored aspect concerns the relationship between ontological knowledge and physical causality. The paper distinguishes semantic ontology (what concepts mean culturally and historically) from physical ontology (what objects are in physical terms) and causal ontology (what events are in causal terms), suggesting these should ultimately be integrated. But the proposal focuses almost exclusively on semantic and cultural ontology, leaving physical and causal dimensions underspecified. Yet many of the failures the paper identifies temporal inconsistencies in video, objects violating physical laws, impossible event sequences seem to require physical and causal modeling rather than or in addition to cultural grounding. Walking requires persistent feet not primarily because of the cultural meaning of "walking" but because of physical facts about how bipedal locomotion works. Integrating semantic, physical, and causal ontology into a coherent framework presents challenges the paper acknowledges but does not resolve.
The scope limitations the paper identifies are appropriately honest. Teleo-Transformers would not improve low-level perceptual quality, computational efficiency, or physics-based simulation. They would not eliminate all cultural bias or guarantee cultural sensitivity. They would not solve the fundamental challenge of cultural plurality and contestation. These limitations circumscribe the framework's potential impact, suggesting it addresses a specific class of failures conceptual incoherence stemming from lack of cultural and historical grounding while leaving other challenges to different approaches. This circumscribed scope is entirely appropriate for a position paper establishing conceptual foundations, but it means the framework cannot be evaluated as a comprehensive solution to AI's cultural and conceptual limitations.
The paper's positioning as an invitation to interdisciplinary collaboration rather than a complete technical solution reflects appropriate epistemic humility. Ghazouani explicitly acknowledges that AI researchers alone cannot construct ontological knowledge bases or evaluate cultural authenticity, and that historians, anthropologists, linguists, cultural theorists, and communities themselves must be partners in this research program. This vision of AI research as necessarily interdisciplinary, where computational innovation serves rather than replaces humanistic scholarship, represents a valuable corrective to tendencies toward technical solutionism. Whether such collaborations can be sustained at the scale required to build comprehensive ontological knowledge bases across languages and cultures remains uncertain, but the aspiration merits serious consideration.
Looking forward, several research directions would help clarify whether the Teleo-Transformer framework addresses genuine architectural limitations or whether alternative approaches might achieve similar goals. Empirical work comparing distributional models trained on carefully curated, richly labeled data against models augmented with explicit ontological embeddings could determine whether the proposed architectural innovations provide benefits beyond what better data alone might achieve. Investigations into whether large language models with sufficient scale and training on diverse corpora spontaneously develop representations that capture ontological relationships would test whether the gap is truly architectural or might be bridged through scale and data quality. Prototype implementations focusing on limited domains perhaps architectural history or specific cultural practices where ontological knowledge is well-documented could demonstrate feasibility and reveal practical challenges not apparent at the conceptual level.
More broadly, the paper invites reflection on what we want AI systems to be able to do and what capabilities matter beyond technical performance metrics. If the goal is systems that engage respectfully and authentically with cultural diversity, historical complexity, and human meaning-making practices, then the ontological gap Ghazouani identifies represents a genuine limitation worth addressing through architectural innovation, knowledge engineering, or both. If the goal is systems that generate outputs users find useful and aesthetically pleasing regardless of cultural authenticity or historical accuracy, then the gap may matter less, and current approaches focused on statistical fidelity may suffice.
The question ultimately concerns values as much as architectures. The Teleo-Transformer proposal embodies a particular vision: that AI systems should treat language and culture not as patterns to be extracted and replicated but as inheritances to be understood and respected. This vision has ethical and intellectual appeal, but realizing it would require research communities, funding agencies, and technology companies to prioritize cultural grounding and historical authenticity alongside technical performance—a shift that faces institutional and economic barriers extending well beyond the technical challenges the paper addresses.
Whether the specific mechanisms Ghazouani proposes Causal Embeddings, dual attention over distributional and ontological spaces, integration of etymological and cultural knowledge bases prove to be the right approach remains to be determined through empirical investigation that this position paper explicitly defers. But the broader challenge the paper articulates is difficult to dismiss: that AI systems capable of remarkable statistical learning may nonetheless engage with human meaning at a level that is, in important respects, superficial. Addressing this superficiality, if indeed it can be addressed, will require grappling with the difficult questions about meaning, culture, and history that the paper raises questions that resist purely computational answers and demand genuinely interdisciplinary engagement.
The ontological gap, whether fully addressed by this proposal or not, names a real phenomenon that deserves continued attention as AI systems increasingly shape how culture is represented, transmitted, and understood. The path forward may involve the specific architectural innovations Ghazouani outlines, or it may require alternative approaches not yet envisioned, but the destination AI systems capable of engaging with human meaning in its cultural and historical richness remains a worthy aspiration even if the route remains uncertain.
