This research paper explains trends in computation across three eras of machine learning
The three critical components driving the evolution of modern machine learning are computing, data, and algorithmic (ML) advances. The article examines trends in the most easily quantifiable element. Prior to 2010, training calculations grew alongside Moore’s Law, doubling every two years. Since the early 2010s when Deep Learning was first introduced, the training compute rate has accelerated, doubling approximately every six months. At the end of 2015, a new trend emerged. The history of computation in ML has been divided into three eras based on these observations: the pre-deep learning era, the deep learning era, and the large-scale era. The article summarizes the rapidly growing computational requirements for training advanced ML systems.
The comparison is made on a dataset of 123 milestone ML systems, annotated with the computation it took to train them. Before Deep Learning took off, there was a period of slow progress. The trend accelerated in 2010 and has not slowed since. Separately, in 2015 and 2016, a new trend of large-scale models appeared, developing at a comparable pace but two orders of magnitude faster than the previous one.
Transition to deep learning
Before and after the advent of Deep Learning, two different trend regimes were observed. Previously, the amount of computing power needed to train machine learning algorithms doubled every 17 to 29 months. As a result, the overall trend is accelerating, doubling every 4-9 months. Moore’s Law, that transistor density doubles every two years (Moore, 1965) – often simplified to double computational performance every two years – essentially matched the trend of the pre-deep learning era . It is unclear when the era of deep learning will begin. The transition from the era of pre-deep learning to the era of deep learning has no noticeable discontinuities. Moreover, the results hardly change if the Deep Learning era started in 2010 or 2012.
Trends in the era of large scale
According to the data, a new trend of large-scale models started in 2015-2016 (see Figure 3). This new trend started at the end of 2015 with AlphaGo and has continued until today. Larger companies with higher training costs could likely break the earlier pattern by training these models on a large scale.
Separately, the popularity of regular-scale models was unaffected. This trend is constant and has the same slope before and after 2016, doubling every 5-6 months, as shown in Table 4.4. The increase in computation in large-scale models appears to be slowing, doubling every 9-10 months. The apparent slowdown could be noise since there is little data on these models. The results contrast with Amodei & Hernandez (2018), who found a doubling period of 3.4 months between 2012 and 2018, and Lyzhov (2021), who found a doubling period greater than 2 years between 2018 and 2020. Previous assessments could not distinguish between these two independent patterns since the large-scale trend had just developed.
The findings align with previous research; however, these show more moderate scaling of the training calculations. There is an 18-month doubling time between 1952 and 2010, a 6-month doubling time between 2010 and 2022, and a new trend of large-scale patterns between late 2015 and 2022, which started 2-3 orders of magnitude earlier and had a doubling time of 10 months. To sum up, computation progressed slowly before the era of Deep Learning. With the transition to the era of Deep Learning in 2010, the trend accelerated. At the end of 2015, companies started producing large-scale models that beat the trend, such as AlphaGo, signaling the start of the large-scale era. However, there is no certainty in the distinction between large-scale and regular-scale models, framing the model. The growing role of hardware infrastructure and engineers in computer science education highlights the strategic need for hardware infrastructure and engineers. Access to huge compute budgets or compute clusters and the expertise to use them has become synonymous with cutting-edge ML research.