Language models and Wittgenstein

September 1, 2022

Contents

  1. Introduction
  2. Questions of meaning
  3. Wittgenstein’s evolution
    1. Background
    2. The picture theory of meaning
    3. Meaning as use
  4. Word meanings in deep learning
    1. Word embeddings
    2. Transformers
    3. Generative language modeling
    4. Discussion
  5. Conclusion
  6. References

Introduction

Questions of meaning

Wittgenstein’s evolution

Background

The picture theory of meaning

Meaning as use

Wittgenstein in PI:

The meaning of a word is its use in the language. 11

and

One cannot guess how a word functions. One has to look at its use, and learn from that. 12

Word meanings in deep learning

Word embeddings

Figure 1: Left panel shows vector offsets for three word pairs illustrating the gender relation. Right panel shows a different projection, and the singular/plural relation for two words. In high-dimensional space, multiple relations can be embedded for a single word (Mikolov, Yih, & Zweig, 2013).

Transformers

Figure 2: Meme about the fame of the transformer network architecture (source: @mishig25).

Generative language modeling

Discussion

Piantadosi:

Modern large language models integrate syntax and semantics in the underlying representations: encoding words as vectors in a high-dimensional space, without an effort to separate out e.g. part of speech categories from semantic representations, or even predict at any level of analysis other than the literal word. Part of making these models work well was in determining how to encode semantic properties into vectors, and in fact initializing word vectors via encodings of distribution semantics from e.g. Mikolov et al. 2013 (Radford et al. 2019). Thus, an assumption of the autonomy of syntax is not required to make models that predict syntactic material and may well hinder it. 18

Conclusion

Bender & Lascarides:

While machine learning has made impressive progress on many separate tasks (given appropriately curated data sets), the keys to generalizing beyond any specific set of end-to-end systems lie in modeling the linguistic system itself such that that model can be reused across tasks. And this can only be done based on an understanding of how language works, including how sentences come to mean what they mean (semantics) and how speakers can use sentence meaning to convey communicative intent (pragmatics). 19

TODO: Not acknowledging the evidence.

TODO: Fallacy of moving the goalposts.

References

Bender, E. M. & Lascarides, A. (2020). Linguistic Fundamentals for Natural Language Processing II: 100 Essentials from Semantics and Pragmatics. Morgan & Claypool. https://link.springer.com/book/10.1007/978-3-031-02172-5
Daitz, E. (1953). The picture theory of meaning. Mind, 62, 184–201. https://www.jstor.org/stable/2251383
Edmonds, D. (2020). The Murder of Professor Schlick: The Rise and Fall of the Vienna Circle. Princeton University Press.
Firth, J. R. (1957). A synopsis of linguistic theory, 1930-1955. In Studies in Linguistic Analysis (pp. 1–31). Oxford: Blackwell.
Gaskin, R. (2009). Realism and the picture theory of meaning. Philosophical Topics, 37, 49–62. https://www.jstor.org/stable/43154543
Keyt, D. (1964). Wittgenstein’s picture theory of language. The Philosophical Review, 73, 493–511. https://www.jstor.org/stable/2183303
Mikolov, T. et al. (2013). Distributed representations of words and phrases and their compositionality. https://arxiv.org/abs/1310.4546
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. https://arxiv.org/abs/1301.3781
Mikolov, T., Yih, W. T., & Zweig, G. (2013). Linguistic regularities in continuous space word representations. NAACL HLT 2013. https://www.aclweb.org/anthology/N13-1090.pdf
Perone, C. S. (2018). NLP word representations and the Wittgenstein philosophy of language. http://blog.christianperone.com/2018/05/nlp-word-representations-and-the-wittgenstein-philosophy-of-language/
Piantadosi, S. T. (2023). Modern language models refute Chomsky’s approach to language. https://lingbuzz.net/lingbuzz/007180
Putnam, H. (1973). Meaning and reference. The Journal of Philosophy, 70, 699–711. https://www.jstor.org/stable/2025079
———. (1975). The meaning of "meaning". In Mind, Language and Reality. Philosophical Papers, vol. 2 (pp. 215–271). Cambridge University Press.
Skelac, I. & Jandric, A. (2020). Meaning as use: From Wittgenstein to Google’s Word2vec. In S. Skansi (Ed.), Guide To Deep Learning Basics: Logical, Historical And Philosophical Perspectives (pp. 41–53). Springer.
Vaswani, A. et al. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 2017, 5998–6008. https://arxiv.org/abs/1706.03762
Wittgenstein, L. (1961). Tractatus Logico-Philosophicus. (D. F. Pears & B. F. McGuinness, Trans.). Routledge. (Originally published in 1922). https://people.umass.edu/klement/tlp/tlp.html
———. (2009). Philosophical Investigations. (E. Anscombe & P. Hacker, Trans., P. Hacker & J. Schulte, Eds.) (4th ed.). Wiley-Blackwell. (Originally published in 1953).
Wolfram, S. (2023). What is ChatGPT doing—and why does it work? https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/

  1. Putnam (1973).↩︎

  2. Putnam (1975).↩︎

  3. Edmonds (2020).↩︎

  4. Wittgenstein (1961).↩︎

  5. Wittgenstein (2009).↩︎

  6. Wittgenstein (1961), p. TODO↩︎

  7. Daitz (1953).↩︎

  8. Keyt (1964).↩︎

  9. Gaskin (2009).↩︎

  10. Firth (1957).↩︎

  11. Wittgenstein (2009), §43.↩︎

  12. Wittgenstein (2009), §340.↩︎

  13. Mikolov, Chen, Corrado, & Dean (2013), Mikolov et al. (2013), and Mikolov, T. et al. (2013).↩︎

  14. Perone (2018).↩︎

  15. Vaswani, A. et al. (2017).↩︎

  16. Wolfram (2023).↩︎

  17. Skelac & Jandric (2020).↩︎

  18. Piantadosi (2023), p. 15.↩︎

  19. Bender & Lascarides (2020).↩︎