Can Vectors Capture Meaning?

Authors

  • Klaudia Bárány Eötvös Loránd University

Abstract

The Challenge of Compositionality

Fodor and Pylyshyn [1] argued that connectionist models struggled with linguistic compositionality - where meaning arises from constituents and syntax - a core requirement of the Language of Thought (LoT) hypothesis. This project computationally investigates whether Word2Vec, a modern distributional semantics model, supports compositionality using simple vector operations as well as Smolensky’s [2] structured Tensor Product Representation (TPR) to encode meaning sensitive to syntactic roles.

Methodology: Probing Word2Vec with Compositional Tasks

Using the pre-trained Google News 300-dimensional Word2Vec model, five compositional probes were tested: Adjective-Noun (A+N) pairs (e.g., “hard rock”), Subject-Verb-Object (SVO) sentences (e.g., “dog bites man”) contrasted with Object-Verb-Subject (OVS) structures (e.g., “man bites dog”) (using simple vector addition), semantic analogies (A – B + C) (e.g., “king – man + woman  ≈ queen”), simple additive combinations  (e.g., “unmarried + man ≈ bachelor”), and TPR for SVO/OVS sentences.

Results

Simple vector addition (subject_vec + verb_vec + object_vec) yielded identical composite vectors for SVO and OVS sentences (cosine similarity = 1.0), proving entirely insensitive to word order and syntactic rules. This supports the LoT argument for structure-sensitive mechanisms. For A + N pairs, vector addition performed a form of semantic blending, often creating plausible composite neighbors and aligning with predefined target words for common or concrete pairings (e.g., “hard rock” yielding music-related terms). However, it struggled with other combinations (e.g., “weak coffee”), suggesting it reflects co-occurrence statistics rather than rule-based modification (composite vector was not significantly closer to the noun than the adjective p > .05).

In contrast, vector arithmetic successfully resolved many semantic analogies (e.g., king – man + woman ≈ queen), indicating Word2Vec captures abstract relationships as consistent vector differences. Simple A + B = C additions (e.g., unmarried + man ≈ bachelor) showed mixed success, highlighting the limitations of pure addition even for semantic combinations. Crucially, a demonstration of TPR for SVO/OVS sentences, using outer products for binding distinct role vectors (grammatical functions) to filler vectors (actual words), showed a non-zero “tensor distance” between SVO and OVS matrix representations. This TPR method successfully encoded structural differences, indicating its potential for modeling structured composition.

Conclusion

Overall, this study computationally confirms that while Word2Vec vector spaces encode rich semantic relationships exploitable by vector arithmetic, simple vector addition is an inadequate mechanism for robust linguistic compositionality, particularly concerning syntax. These findings reinforce the LoT’s emphasis on structured representations [1]. While simple vector addition fails to meet these structural demands, the successful differentiation of SVO/OVS structures by the TPR demonstrates that more sophisticated, structure-sensitive operations can leverage distributional word embeddings to better model compositional meaning. This highlights the need for such advanced compositional operations in computational models of language and thought.

References

[1] J. A. Fodor, and Z. W. Pylyshyn, “Connectionism and cognitive architecture: a critical analysis,” Cognition, vol. 28, no. 1-2, pp. 3-71, March 1988, doi: 10.1016/0010-0277(88)90031-5.

[2] P. Smolensky, “Tensor product variable binding and the representation of symbolic structures in connectionist systems,” Artificial Intelligence, vol. 46, no. 1-2, pp. 159-216, Nov. 1990. doi: 10.1016/0004-3702(90)90007-M.

Published

2025-06-10