GPT is a large language model that can generate natural language texts based on a given input. One of the remarkable abilities of GPT is that it can perform simple calculations, such as addition, subtraction, multiplication and division, even though it was not explicitly trained to do so. How did GPT learn to do simple calculations if it is a large language model?
One possible explanation is that GPT learned to do simple calculations by exploiting the statistical patterns and regularities in the large corpus of text that it was trained on. For example, GPT may have encountered many examples of texts that contain numerical expressions and their results, such as «two plus two equals four» or «five times six is thirty». By analyzing these texts, GPT may have learned to associate certain words and symbols with mathematical operations and values, and to infer the rules and logic behind them. GPT may have also learned to generalize these rules and logic to new numerical expressions that it has not seen before, such as «three minus one equals two» or «seven divided by two is three point five».
Another possible explanation is that GPT learned to do simple calculations by using its attention mechanism and its large hidden state. The attention mechanism allows GPT to focus on the relevant parts of the input and the hidden state when generating the output. The hidden state is a vector that represents the context and the memory of GPT. By using the attention mechanism and the hidden state, GPT may have learned to encode and manipulate numerical information in a way that resembles arithmetic computation. For example, when GPT encounters an input such as «what is four plus six?», it may use its attention mechanism to identify the numbers and the operation in the input, and use its hidden state to store and update the intermediate and final results of the calculation.