Codatta Arena to Host the First Web3-Based Evaluation
The race to build the most powerful large language models (LLMs) is heating up, and Alibaba has just raised the stakes with the release of Qwen3, a formidable new family of AI models designed to compete with the best in the industry. Ranging from lightweight 0.6 billion parameter models to the flagship Qwen3–235B-A22B, this series combines dense and Mixture of Experts (MoE) architectures to deliver state-of-the-art performance across domains like coding, mathematics, and general reasoning. But the real game-changer isn’t just the models themselves — it’s how they’re being evaluated.
For the first time, Qwen3 will face scrutiny in the Codatta Arena, the pioneering Web3-native platform for LLM evaluation. By leveraging decentralized data, transparent reward systems, and community-driven human preference scoring, Codatta Arena is redefining how AI models are tested and ranked. This article explores the significance of Qwen3’s release, its integration into Codatta Arena, and why this collaboration marks a turning point for open AI and Web3 innovation.
Dive into what makes this moment so exciting.
Qwen3: Alibaba’s Bold Leap Forward
Alibaba’s Qwen3 series, officially released on April 29, 2025, is a testament to the company’s ambition to lead the global AI race. Building on the success of its predecessors (Qwen and Qwen2), Qwen3 introduces a diverse lineup of eight models, including six dense models (0.6B, 1.7B, 4B, 8B, 14B, and 32B parameters) and two MoE models (Qwen3–30B-A3B and Qwen3–235B-A22B). These models are open-weight, licensed under Apache 2.0, and available on platforms like Hugging Face, ModelScope, GitHub, and Kaggle, making them accessible to developers worldwide.
Key Features of Qwen3
- Hybrid Reasoning Modes: Qwen3 introduces a dual-mode approach, allowing seamless switching between Thinking Mode (for step-by-step reasoning on complex tasks like math and coding) and Non-Thinking Mode (for rapid, general-purpose responses). Developers can control the “thinking budget” to balance performance and computational efficiency, with support for context lengths up to 38,000 tokens.
- Mixture of Experts (MoE) Architecture: The MoE models, such as Qwen3–30B-A3B (30 billion total parameters, 3 billion active) and Qwen3–235B-A22B (235 billion total, 22 billion active), activate only a subset of parameters per task, delivering high performance with lower computational costs. For example, Qwen3–30B-A3B outperforms Alibaba’s earlier QwQ-32B with just a fraction of the active parameters.
- Top-Tier Performance: Qwen3–235B-A22B achieves competitive results against leading models like DeepSeek-R1, OpenAI’s o1 and o3-mini, xAI’s Grok-3, and Google’s Gemini 2.5 Pro across benchmarks like AIME25 (math), LiveCodeBench (coding), and Arena-Hard (instruction-tuned LLMs). Notably, even the smaller Qwen3–4B rivals the much larger Qwen2.5–72B-Instruct.
- Multilingual Mastery: Supporting 119 languages and dialects, Qwen3 excels in translation and multilingual instruction-following, making it a versatile tool for global applications.
- Agentic and Coding Capabilities: Optimized for coding and tool-using scenarios, Qwen3 supports the Model Context Protocol (MCP) and robust function-calling, enabling seamless integration with external tools and frameworks like SGLang, vLLM, and Ollama.
- Massive Training Data: Trained on a dataset of 36 trillion tokens — double that of Qwen2.5 — Qwen3 leverages web content, documents, and synthetic data to enhance reasoning, instruction-following, and domain-specific tasks.
Qwen3’s open-source nature and competitive performance position it as a serious contender in the LLM landscape, challenging both proprietary models and other open-source leaders like DeepSeek-R1.
As noted in a post on X, “You can now run a better model than OpenAI o3-mini or DeepSeek R1 100% locally Qwen has just released Qwen 3 with an open-source license and 8 different sizes.”
Codatta Arena: The Web3-Native Evaluation Revolution
While Qwen3’s technical achievements are impressive, its integration into Codatta Arena marks a paradigm shift in how LLMs are evaluated. Unlike traditional evaluation methods, which rely on proprietary benchmarks or closed lab environments, Codatta Arena is the first Web3-native platform designed to democratize AI evaluation through community participation and decentralized infrastructure.
What Is Codatta Arena?
Codatta Arena, hosted at app.codatta.io/arena, is a groundbreaking platform that combines human preference scoring with Web3 principles to create transparent, reproducible, and community-driven LLM evaluations. Built on Codatta’s expertise in decentralized data and AI collaboration, the platform aims to move beyond static benchmarks by involving real users in the evaluation process. Key features include:
- Transparent Reward Systems: Evaluators earn on-chain rewards for contributing to model assessments, incentivizing broad participation and ensuring fairness.
- Decentralized Model Hosting: Models and their versions are stored on-chain, ensuring immutability and traceability.
- Open Data Pipelines: Evaluation data is openly accessible, promoting reproducibility and trust in the results.
- Cross-Evaluation Framework: Models are compared across diverse categories, from coding and reasoning to creative writing and multilingual tasks, reflecting real-world use cases.
Codatta Arena’s Web3 foundation aligns with the ethos of open AI, making it the perfect venue for evaluating Qwen3’s capabilities. As a recent X post by @codatta_io stated, “Qwen3’s open release is the perfect fit for this kind of community-powered evaluation.”
Why Qwen3 in Codatta Arena Matters
Traditional LLM evaluations often rely on curated benchmarks like MMLU or Arena-Hard, which, while valuable, may not fully capture a model’s real-world performance or user satisfaction. Codatta Arena addresses this by crowdsourcing human feedback, allowing users to vote on model outputs and compare Qwen3 against other frontier models like Grok-3, DeepSeek-R1, and OpenAI’s o-series. This approach offers several advantages:
- Community-Driven Insights: By involving a diverse pool of evaluators, Codatta Arena captures a broader range of perspectives, ensuring evaluations reflect real user preferences rather than lab-controlled metrics.
- Decentralized Transparency: On-chain reward systems and data pipelines eliminate gatekeeping, making the evaluation process open and auditable.
- Dynamic Rankings: The live leaderboard at app.codatta.io/arena provides real-time updates, allowing users to see how Qwen3 stacks up against competitors as evaluations progress.
- Incentivized Participation: Upcoming participation rewards (to be announced) will further encourage community involvement, fostering a vibrant ecosystem of evaluators and developers.
The integration of Qwen3 into Codatta Arena is a natural fit, given Alibaba’s commitment to open-source AI and Codatta’s mission to empower communities through decentralized data. As one X user noted, “The long-awaited Qwen3 is finally here! … We’ve made significant progress in pretraining, large-scale reinforcement learning, and integration of reasoning modes.” This synergy positions Codatta Arena as the ideal platform to showcase Qwen3’s strengths and uncover its potential limitations.
How to Join the Qwen3 Evaluation
Getting involved in the Qwen3 evaluation is simple and open to anyone with an interest in AI, Web3, or community-driven innovation. Here’s how you can participate:
- Visit Codatta Arena: Head to app.codatta.io/arena to access the live leaderboard and explore the Qwen3 series.
- Vote on Model Outputs: Review and score Qwen3’s responses across various tasks, from coding challenges to creative writing prompts.
- Compare with Other Models: Pit Qwen3 against competitors like DeepSeek-R1, Grok-3, or Gemini 2.5 Pro to help determine its true ranking.
- Contribute to the Leaderboard: Your votes will shape the crowdsourced rankings, providing valuable insights for developers and users.
- Earn Rewards (Coming Soon): Stay tuned for participation rewards, which will be distributed via Codatta’s on-chain system.
By joining the evaluation, you’re not just testing a model — you’re helping to shape the future of open AI by contributing to a transparent, community-driven process. As @codatta_io emphasized on X, “anyone can participate in evaluating and shaping [Qwen3’s] future.”
Challenges and Opportunities
The Qwen3-Codatta Arena collaboration is a bold step forward, but it’s not without challenges. Key considerations include:
- Evaluator Diversity: Ensuring a broad and representative pool of evaluators is critical to avoiding bias in human preference scoring. Codatta’s decentralized approach helps mitigate this, but outreach and education will be essential.
- Scalability: Hosting and evaluating large models like Qwen3–235B-A22B requires significant computational resources. Codatta’s Web3 infrastructure, combined with Alibaba’s cloud integration, provides a strong foundation, but scaling to millions of users will be a test.
- Reward System Design: The upcoming participation rewards must be carefully structured to incentivize quality contributions without encouraging gaming of the system.
Despite these challenges, the opportunities are immense. Codatta Arena’s Web3-native approach could redefine LLM evaluation, making it more inclusive, transparent, and aligned with real-world needs. Qwen3’s open-weight release further amplifies this potential, enabling developers to fine-tune and deploy the models in diverse applications, from DeFi platforms to NFT marketplaces. As one X post highlighted, “Qwen3 is optimized for operation on the Alibaba Cloud, significantly reducing latency and enabling immediate scalability in line with business priorities.”
The Bigger Picture: Web3 and AI Convergence
The Qwen3-Codatta Arena collaboration is more than a single evaluation — it’s a milestone in the convergence of AI and Web3. As AI models like Qwen3 become more powerful, their integration into decentralized ecosystems is transforming industries. For example:
- Blockchain Optimization: Qwen3’s ability to generate code and verify smart contracts can enhance the efficiency and security of Web3 networks.
- Decentralized Finance (DeFi): AI-driven analytics, powered by models like Qwen3, can optimize yield strategies and predict market trends, democratizing access to sophisticated financial tools.
- NFT and Digital Art: As seen in Codatta’s earlier work with LoRA and Azuki-style NFTs, AI models can create branded, community-driven art, with Codatta Arena providing a platform to evaluate their creative outputs.
- Autonomous Agents: By 2025, experts estimate over one million AI agents will operate within Web3 networks, managing investments, staking, and trading. Qwen3’s agentic capabilities position it as a key player in this space.
This convergence is reshaping the digital economy, and Codatta Arena’s role as a Web3-native evaluation platform ensures that AI development remains open, collaborative, and community-driven. As Alibaba’s Qwen team noted on X, “We believe that the release and open-sourcing of Qwen3 will significantly advance the research and development of large foundation models.”
Final Thoughts: A New Era for Open AI
Alibaba’s Qwen3 is a bold addition to the LLM ecosystem, combining cutting-edge performance, open-source accessibility, and efficient MoE architectures to challenge the industry’s top models. Its integration into Codatta Arena marks a pivotal moment, bringing Web3 principles to AI evaluation and empowering communities to shape the future of AI.
By hosting Qwen3’s first Web3-based evaluation, Codatta Arena is not just testing a model — it’s pioneering a new approach to AI development that prioritizes transparency, decentralization, and human feedback. As @codatta_io put it, “We’re just getting started. Let’s push the limits of open AI — together.”
Join the evaluation at app.codatta.io/arena, vote on Qwen3’s outputs, and help crowdsource the true ranking of LLMs. Together, we can build a more open, equitable, and innovative AI ecosystem — one evaluation at a time.