🔥Early Access List: Land A High Paying Web3 Job In 90 Days LEARN MORE

The hidden costs of AI training models

In this post:

  • Developing AI models is expensive, costing hundreds of millions due to computing and data needs.
  • High data costs centralize AI development, raising ethical concerns.
  • Independent groups work on open datasets, with new strategies like generative data still in testing.

Building and supporting modern AI models require significant investments, which may exceed hundreds of millions of dollars. Estimates indicate that these costs may hit a billion dollars in the near future. 

This expenditure is mainly due to computing power where entities like Nvidia GPUs are used, which may cost about $30,000 each and may require thousands more to be efficient. Researchers have stated that the quality and quantity of the training data set used in developing such models are very important. 

Industry leaders reveal staggering costs of AI development

According to James Betker of OpenAI, the performance of a model is a function of the training data rather than the design or architecture of the model. His assertion is that models trained on big data sets will reach the same results. Therefore, data is the key to the advancement of AI technology. 

Dario Amodei, CEO of the AI firm Anthropic AI, shared his insights about the financial aspects of these challenges in the In Good Company podcast. He stated that training the current models, such as ChatGPT-4, is estimated to cost around $100 million, and training for future models may require $10-100 billion in the next few years.

Generative AI models, and the ones created by large firms, are, at their core, statistical models. Therefore, they use a lot of examples to predict the most probable outcomes. Kyle Lo from the Allen Institute for AI (AI2) says that the gain in performance can be mostly attributed to the data, especially when the training environment is consistent. 

See also  OpenAI prepares for massive AI infrastructure buildout in the United States

Data centralization raises ethical and accessibility concerns

The high cost of obtaining good quality data is making the development of AI the preserves of a few large companies in the developed world. This aggregation of resources is also a source of concern regarding the availability of AI technology and the possibility of misuse. 

OpenAI alone has spent hundreds of millions of dollars on data licenses, and Meta has considered purchasing publishers for data access. The AI training data market is expected to expand, and data brokers are likely to benefit from this opportunity. 

Problems arise from questionable data acquisition practices. According to the reports, many companies have captured large volumes of content without the authorization of the owners of such content, and some companies harness data from different platforms and do not remunerate the users. As we previously reported, OpenAI used its Whisper audio transcription model to transcribe more than a million hours of YouTube videos to fine-tune GPT-4.

Organizations work to create open-access AI training datasets

As the data acquisition race presents some problems, some efforts from independent parties are needed to make training datasets openly available. Some organizations, such as EleutherAI and Hugging Face, are creating large datasets that are available to the public for AI development.

See also  X resolves data dispute with Europe’s Data Protection Commission

The Wall Street Journal recently highlighted two potential strategies to solve data acquisition issues: generative data generation and curriculum learning. Synthetic data is created using AI models themselves, while curriculum learning tries to provide models with high-quality data in a structured way so that they can make connections even with less data. However, both methods are still in the developmental stages, and their efficacy has not been tested yet. 

Share link:

Disclaimer. The information provided is not trading advice. Cryptopolitan.com holds no liability for any investments made based on the information provided on this page. We strongly recommend independent research and/or consultation with a qualified professional before making any investment decisions.

Most read

Loading Most Read articles...

Stay on top of crypto news, get daily updates in your inbox

Related News

DOJ
Cryptopolitan
Subscribe to CryptoPolitan

Interested in launching your Web3 career and landing a high-paying job in 90 days?

Leading industry experts show you how with this bran new course: Crypto Career Launchpad

Join the early access list below and be the first to know when the course opens its doors. You’ll also save $100’s off the regular launch price.