Language models and Wittgenstein
September 1, 2022
Contents
- Introduction
- Questions of meaning
- Wittgenstein’s evolution
- Word meanings in deep learning
- Conclusion
- References
Introduction
- OpenAI. (2022). ChatGPT: Optimizing Language Models for Dialogue.
- OpenAI. (2023). GPT-4.
Questions of meaning
- Intension vs extension
- Syntax vs semantics
- Putnam, H. (1973). Meaning and reference. 1
- Putnam, H. (1975). The meaning of “meaning”. 2
Wittgenstein’s evolution
Background
- The Vienna Circle
- Edmonds 3
- Tractatus Logico-Philosophicus (1922) 4
- Philosophical Investigations (1953) 5
The picture theory of meaning
- Tractatus Logico-Philosophicus (1922) 6
- The picture theory of meaning is a correspondence theory of truth.
- philosophy-in-figures/picture-theory-of-meaning-wittgenstein
- Daitz, E. (1953). The picture theory of meaning. 7
- Keyt, D. (1964). Wittgenstein’s picture theory of language. 8
- Gaskin, R. (2009). Realism and the picture theory of meaning. 9
Meaning as use
- Philosophical Investigations (1953)
- The theory of meaning as use is a coherence theory of truth.
- Carnap
- Principle of tolerance
- Firth
- Firth, J.R. (1957): “You shall know a word by the company it keeps.” 10
Wittgenstein in PI:
The meaning of a word is its use in the language. 11
and
One cannot guess how a word functions. One has to look at its use, and learn from that. 12
Word meanings in deep learning
Word embeddings
- Mikolov: word2vec 13
- Word (token) embeddings
- Olah, C. (2014). Deep learning, NLP, and representations.
- Alammar, J. (2019). The illustrated word2vec.
- Migdal, P. (2017). king - man + woman is queen; but why?
- Discussion of relations captured by word2vec by The Gradient
- Perone, C.S. (2018). NLP word representations and the Wittgenstein philosophy of language. 14
Transformers
- Transformer 15
- Positional encodings
- Self-attention
Generative language modeling
- Generative execution of next-token prediction with language models
- GPT
- Wolfram, S. (2023). What is ChatGPT doing—and why does it work? 16
Discussion
- Belloni, M. (2019). Neural networks and philosophy of language: Why Wittgenstein’s theories are the basis of all modern NLP.
- Goldhill, O. (2019). Google Translate is a manifestation of Wittgenstein’s theory of language.
- Skelac, I. & Jandric, A. (2020). Meaning as use: From Wittgenstein to Google’s Word2vec. 17
- Boccelli, D. (2022). Word embeddings align with Kandinsky’s theory of color.
- Tweet by Joscha Bach, Mar 25, 2023
Piantadosi:
Modern large language models integrate syntax and semantics in the underlying representations: encoding words as vectors in a high-dimensional space, without an effort to separate out e.g. part of speech categories from semantic representations, or even predict at any level of analysis other than the literal word. Part of making these models work well was in determining how to encode semantic properties into vectors, and in fact initializing word vectors via encodings of distribution semantics from e.g. Mikolov et al. 2013 (Radford et al. 2019). Thus, an assumption of the autonomy of syntax is not required to make models that predict syntactic material and may well hinder it. 18
Conclusion
Bender & Lascarides:
While machine learning has made impressive progress on many separate tasks (given appropriately curated data sets), the keys to generalizing beyond any specific set of end-to-end systems lie in modeling the linguistic system itself such that that model can be reused across tasks. And this can only be done based on an understanding of how language works, including how sentences come to mean what they mean (semantics) and how speakers can use sentence meaning to convey communicative intent (pragmatics). 19
TODO: Not acknowledging the evidence.
TODO: Fallacy of moving the goalposts.
- A surprising amount about the world can be learned from simply predicting what will come next in written language.
- What do words mean?
- LLMs can be seen as a vindication of the later Wittgenstein theory of meaning as use.
- On the other hand, the division between syntax and semantics is porous.
- That true structural relations can be found in language alone supports a view of accurate correspondence, something like the picture theory of meaning.
- Structural realism
- Both of Wittgenstein’s views remain relevant.
References
Putnam (1973).↩︎
Putnam (1975).↩︎
Edmonds (2020).↩︎
Wittgenstein (1961).↩︎
Wittgenstein (2009).↩︎
Wittgenstein (1961), p. TODO↩︎
Daitz (1953).↩︎
Keyt (1964).↩︎
Gaskin (2009).↩︎
Firth (1957).↩︎
Wittgenstein (2009), §43.↩︎
Wittgenstein (2009), §340.↩︎
Mikolov, Chen, Corrado, & Dean (2013), Mikolov et al. (2013), and Mikolov, T. et al. (2013).↩︎
Perone (2018).↩︎
Vaswani, A. et al. (2017).↩︎
Wolfram (2023).↩︎
Skelac & Jandric (2020).↩︎
Piantadosi (2023), p. 15.↩︎
Bender & Lascarides (2020).↩︎