Language models and Wittgenstein

September 1, 2022

Introduction

OpenAI. (2022). ChatGPT: Optimizing Language Models for Dialogue.
OpenAI. (2023). GPT-4.

Questions of meaning

Intension vs extension
Syntax vs semantics
Putnam, H. (1973). Meaning and reference. ¹
Putnam, H. (1975). The meaning of “meaning”. ²

Wittgenstein’s evolution

Background

The Vienna Circle
Edmonds ³
Tractatus Logico-Philosophicus (1922) ⁴
Philosophical Investigations (1953) ⁵

The picture theory of meaning

Tractatus Logico-Philosophicus (1922) ⁶
The picture theory of meaning is a correspondence theory of truth.
philosophy-in-figures/picture-theory-of-meaning-wittgenstein
Daitz, E. (1953). The picture theory of meaning. ⁷
Keyt, D. (1964). Wittgenstein’s picture theory of language. ⁸
Gaskin, R. (2009). Realism and the picture theory of meaning. ⁹

Meaning as use

Philosophical Investigations (1953)
The theory of meaning as use is a coherence theory of truth.
Carnap
- Principle of tolerance
Firth
- Firth, J.R. (1957): “You shall know a word by the company it keeps.” ¹⁰

Wittgenstein in PI:

The meaning of a word is its use in the language. ¹¹

and

One cannot guess how a word functions. One has to look at its use, and learn from that. ¹²

Word meanings in deep learning

Word embeddings

Figure 1: Left panel shows vector offsets for three word pairs illustrating the gender relation. Right panel shows a different projection, and the singular/plural relation for two words. In high-dimensional space, multiple relations can be embedded for a single word (Mikolov, Yih, & Zweig, 2013).

Mikolov: word2vec ¹³
- Word (token) embeddings
Olah, C. (2014). Deep learning, NLP, and representations.
Alammar, J. (2019). The illustrated word2vec.
Migdal, P. (2017). king - man + woman is queen; but why?
Discussion of relations captured by word2vec by The Gradient
Perone, C.S. (2018). NLP word representations and the Wittgenstein philosophy of language. ¹⁴

Transformers

Figure 2: Meme about the fame of the transformer network architecture (source: @mishig25).

Transformer ¹⁵
Positional encodings
Self-attention

Generative language modeling

Generative execution of next-token prediction with language models
GPT
Wolfram, S. (2023). What is ChatGPT doing—and why does it work? ¹⁶

Discussion

Belloni, M. (2019). Neural networks and philosophy of language: Why Wittgenstein’s theories are the basis of all modern NLP.
Goldhill, O. (2019). Google Translate is a manifestation of Wittgenstein’s theory of language.
Skelac, I. & Jandric, A. (2020). Meaning as use: From Wittgenstein to Google’s Word2vec. ¹⁷
Boccelli, D. (2022). Word embeddings align with Kandinsky’s theory of color.
Tweet by Joscha Bach, Mar 25, 2023

Piantadosi:

Modern large language models integrate syntax and semantics in the underlying representations: encoding words as vectors in a high-dimensional space, without an effort to separate out e.g. part of speech categories from semantic representations, or even predict at any level of analysis other than the literal word. Part of making these models work well was in determining how to encode semantic properties into vectors, and in fact initializing word vectors via encodings of distribution semantics from e.g. Mikolov et al. 2013 (Radford et al. 2019). Thus, an assumption of the autonomy of syntax is not required to make models that predict syntactic material and may well hinder it. ¹⁸

Conclusion

Bender & Lascarides:

While machine learning has made impressive progress on many separate tasks (given appropriately curated data sets), the keys to generalizing beyond any specific set of end-to-end systems lie in modeling the linguistic system itself such that that model can be reused across tasks. And this can only be done based on an understanding of how language works, including how sentences come to mean what they mean (semantics) and how speakers can use sentence meaning to convey communicative intent (pragmatics). ¹⁹

TODO: Not acknowledging the evidence.

TODO: Fallacy of moving the goalposts.

A surprising amount about the world can be learned from simply predicting what will come next in written language.
What do words mean?
LLMs can be seen as a vindication of the later Wittgenstein theory of meaning as use.
On the other hand, the division between syntax and semantics is porous.
That true structural relations can be found in language alone supports a view of accurate correspondence, something like the picture theory of meaning.
Structural realism
Both of Wittgenstein’s views remain relevant.

References

Bender, E. M. & Lascarides, A. (2020). Linguistic Fundamentals for Natural Language Processing II: 100 Essentials from Semantics and Pragmatics. Morgan & Claypool. https://link.springer.com/book/10.1007/978-3-031-02172-5

Daitz, E. (1953). The picture theory of meaning. Mind, 62, 184–201. https://www.jstor.org/stable/2251383

Edmonds, D. (2020). The Murder of Professor Schlick: The Rise and Fall of the Vienna Circle. Princeton University Press.

Firth, J. R. (1957). A synopsis of linguistic theory, 1930-1955. In Studies in Linguistic Analysis (pp. 1–31). Oxford: Blackwell.

Gaskin, R. (2009). Realism and the picture theory of meaning. Philosophical Topics, 37, 49–62. https://www.jstor.org/stable/43154543

Keyt, D. (1964). Wittgenstein’s picture theory of language. The Philosophical Review, 73, 493–511. https://www.jstor.org/stable/2183303

Mikolov, T. et al. (2013). Distributed representations of words and phrases and their compositionality. https://arxiv.org/abs/1310.4546

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. https://arxiv.org/abs/1301.3781

Mikolov, T., Yih, W. T., & Zweig, G. (2013). Linguistic regularities in continuous space word representations. NAACL HLT 2013. https://www.aclweb.org/anthology/N13-1090.pdf

Perone, C. S. (2018). NLP word representations and the Wittgenstein philosophy of language. http://blog.christianperone.com/2018/05/nlp-word-representations-and-the-wittgenstein-philosophy-of-language/

Piantadosi, S. T. (2023). Modern language models refute Chomsky’s approach to language. https://lingbuzz.net/lingbuzz/007180

Putnam, H. (1973). Meaning and reference. The Journal of Philosophy, 70, 699–711. https://www.jstor.org/stable/2025079

———. (1975). The meaning of "meaning". In Mind, Language and Reality. Philosophical Papers, vol. 2 (pp. 215–271). Cambridge University Press.

Skelac, I. & Jandric, A. (2020). Meaning as use: From Wittgenstein to Google’s Word2vec. In S. Skansi (Ed.), Guide To Deep Learning Basics: Logical, Historical And Philosophical Perspectives (pp. 41–53). Springer.

Vaswani, A. et al. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 2017, 5998–6008. https://arxiv.org/abs/1706.03762

Wittgenstein, L. (1961). Tractatus Logico-Philosophicus. (D. F. Pears & B. F. McGuinness, Trans.). Routledge. (Originally published in 1922). https://people.umass.edu/klement/tlp/tlp.html

———. (2009). Philosophical Investigations. (E. Anscombe & P. Hacker, Trans., P. Hacker & J. Schulte, Eds.) (4th ed.). Wiley-Blackwell. (Originally published in 1953).

Wolfram, S. (2023). What is ChatGPT doing—and why does it work? https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/

Putnam (1973).↩︎
Putnam (1975).↩︎
Edmonds (2020).↩︎
Wittgenstein (1961).↩︎
Wittgenstein (2009).↩︎
Wittgenstein (1961), p. TODO↩︎
Daitz (1953).↩︎
Keyt (1964).↩︎
Gaskin (2009).↩︎
Firth (1957).↩︎
Wittgenstein (2009), §43.↩︎
Wittgenstein (2009), §340.↩︎
Mikolov, Chen, Corrado, & Dean (2013), Mikolov et al. (2013), and Mikolov, T. et al. (2013).↩︎
Perone (2018).↩︎
Vaswani, A. et al. (2017).↩︎
Wolfram (2023).↩︎
Skelac & Jandric (2020).↩︎
Piantadosi (2023), p. 15.↩︎
Bender & Lascarides (2020).↩︎