The Data Bottleneck in AI: Unlocking Frontier Data for the Future of AGI and DeSci

Adiele Wisdom Nnamdi
5 min readApr 8, 2025

--

Artificial Intelligence (AI) has made incredible strides over the past decade, with remarkable breakthroughs across various industries, from healthcare to finance. However, the next frontier in AI development, particularly the path toward Artificial General Intelligence (AGI), faces a crucial challenge that is not related to algorithms or computational power — it is the lack of access to high-quality, domain-specific data, also known as frontier data. This data is essential for building specialized vertical AI agents and advancing scientific discovery, yet much of it remains untapped due to the inefficiencies in the current data ecosystem.

The Challenge: Data Bottlenecks and the Need for Frontier Data
The biggest hurdle preventing AI from reaching its full potential is not just about improving the algorithms that power machine learning models; it is about securing the right data — especially frontier data. This kind of data is highly specialized, domain-specific, and crucial for building vertical AI solutions tailored to particular industries or scientific fields. Frontier data, however, is often inaccessible or underutilized because there is no efficient method for developers and researchers to easily access, share, or monetize it.

As the AI landscape evolves, particularly within decentralized scientific research (DeSci) and AI startups, the need for access to quality data has never been more pressing. Small, agile teams now have the potential to disrupt industries, but without the right data at their fingertips, they will struggle to innovate. The current data ecosystem undervalues the contributions of data creators and often places unnecessary barriers to entry for smaller teams and individuals. To unlock AI’s true potential, we must find a way to democratize access to high-quality data while ensuring that the value of that data is shared fairly with its creators.

The Royalty Model: Aligning Incentives for Data Creation
To overcome this bottleneck, we need to reimagine how data is shared, valued, and compensated. One promising solution is the royalty model, a system that aligns incentives between data creators and AI developers. Under the royalty model, instead of paying for data upfront, developers share the risks and rewards associated with data creation. This system reduces the financial burden on developers, encourages collaboration, and creates a more equitable data ecosystem.

In this new paradigm, skilled contributors — such as healthcare professionals, legal experts, or other domain specialists — are more likely to participate in data creation and sharing because their compensation is tied to the value their data generates. This model fosters ongoing participation, ensuring that data quality improves over time. By linking compensation directly to the value of the data, we empower data creators to contribute more actively, leading to higher-quality, domain-specific data that is crucial for vertical AI and AGI.

XnY: Assetifying Data for Vertical AI
To make this vision a reality, data must be treated as an asset. This means that data should be priced and traded just like any other valuable resource, such as stocks or commodities. However, pricing data is a challenge because its value can vary depending on the context in which it is used, and data’s dynamic nature means its value fluctuates over time. To solve this, the creation of liquidity in the data market is critical. With efficient pricing mechanisms, AI developers can trade data at fair market values, and data creators can be compensated according to the true value of their contributions.

XnY Network, a blockchain-based platform developed by Codatta Labs, is leading the charge in assetifying data. XnY turns data into digital assets that can be priced, traded, and owned just like any other financial asset. This blockchain infrastructure enables the transparent and efficient transfer of data ownership, which opens the door for AI developers to access the high-quality data needed to fuel innovation. By allowing data to be exchanged in a transparent and secure marketplace, XnY ensures that data creators receive fair compensation for their contributions.

Codatta: A Marketplace for Data
At the core of this ecosystem is Codatta, a data marketplace designed to connect AI developers and data creators directly
.
Codatta operates similarly to an ad exchange, linking those who need data with those who can provide it. It is a permissionless platform, meaning that anyone can participate, and all transactions are verified using blockchain technology to ensure transparency and fairness.

Codatta is also building a data asset exchange where data can be traded and valued based on its utility. This exchange provides a way to establish the true market value of data, which can fluctuate according to demand. By providing liquidity to the data market, Codatta accelerates the process of data discovery and enables AI developers and researchers to find the right data for their projects. In turn, this helps drive the development of vertical AI solutions and advances scientific knowledge in fields like DeSci (Decentralized Science).

The Future of AI and Data: Unlocking the Power of Frontier Data
The importance of frontier data cannot be overstated when it comes to the future of AI. As AI systems continue to evolve, particularly in the context of AGI and vertical AI, the demand for high-quality, domain-specific data will only grow. Frontier data will increasingly become the fuel that powers these systems, driving performance, precision, and innovation in ways we are only beginning to imagine.

The royalty-based model, supported by platforms like XnY and Codatta, offers a promising solution to the data bottleneck. By ensuring that data creators are fairly compensated for their contributions and turning data into a tradable asset, we can build a sustainable, efficient, and transparent data ecosystem. This will not only accelerate progress toward AGI but also enable groundbreaking advancements in scientific research and vertical AI applications.

As Garry Tan, President and CEO of Y-Combinator, aptly noted, “Vertical AI agents could be 10X bigger than SaaS.” Imagine the possibilities: AI-driven solutions that transform healthcare, legal systems, education, and beyond. The future is within reach, and it is powered by frontier data. By unlocking access to this valuable resource, we can build a better, more innovative world where data creators, developers, and researchers all share in the rewards.

Conclusion: A New Era for Data and AI
The future of AI is undeniably tied to the availability and accessibility of frontier data. The traditional data ecosystem, which often keeps high-quality data out of reach, must evolve to support the next wave of AI innovation. Through systems like the royalty model, blockchain-based infrastructure like XnY, and data marketplaces like Codatta, we can create a more equitable and efficient data economy.

As AI continues to evolve, the value of frontier data will rise, and those who contribute to its creation will play an essential role in shaping the future of AI and scientific discovery. The royalty model ensures that contributors can share in the success of their data, turning data ownership into a valuable asset. In this new era, data will be the backbone of AGI, vertical AI, and decentralized science — transforming industries and improving lives in ways we have yet to fully realize.

By unlocking the true value of frontier data, we pave the way for a future where AI reaches its full potential, and the benefits of that progress are shared fairly among all who contribute.

--

--

Adiele Wisdom Nnamdi
Adiele Wisdom Nnamdi

Written by Adiele Wisdom Nnamdi

Student Ambassador, Blockchain Enthusiast

Responses (18)