New Trends in Web3 Data Indexing Development: AI Empowering Full-Chain Data Services

From Data Source to Intelligent Analysis: Analyzing the Development of Web3 Data Indexing Track

1. Introduction

Since the launch of the first decentralized applications in 2017, the blockchain ecosystem has flourished, with various dApps emerging like mushrooms after rain. When discussing these decentralized applications, have we ever considered the data sources they rely on?

In 2024, artificial intelligence and Web3 have become hot topics. In the field of AI, data is like the source of life, driving the continuous evolution of intelligent systems. Just as plants need sunlight and moisture, AI systems also rely on massive amounts of data to learn and think. Without data support, even the most advanced AI algorithms struggle to realize their potential.

This article will delve into the evolution of blockchain data accessibility, comparing the similarities and differences between traditional data indexing protocols and emerging blockchain data services, with a particular focus on the innovations in data services and product architecture of new protocols that integrate AI technology.

Read, Index to Analyze, Brief Overview of the Web3 Data Indexing Track

2. The Evolution of Data Indexing: From Blockchain Nodes to Full Chain Databases

2.1 Data Source: Blockchain Node

Blockchain is referred to as a decentralized ledger, and blockchain nodes are the cornerstone of this network, responsible for recording, storing, and disseminating on-chain transaction data. Each node maintains a complete copy of the blockchain data, ensuring the decentralized nature of the network. However, for ordinary users, building and maintaining a node not only has a high technical threshold but also incurs expensive hardware and bandwidth costs. Moreover, the querying capability of ordinary nodes is limited, making it difficult to meet the needs of developers. Therefore, although theoretically everyone can run a node, in practice users tend to rely on third-party services.

To solve this problem, RPC node providers have emerged. They bear the costs of node management and provide data access services through RPC endpoints. Public RPC endpoints are free but have rate limits, which may affect the user experience of dApps. Private RPC endpoints have better performance, but their efficiency is not high for complex queries, and they are difficult to scale across networks. Nevertheless, the standardized API interfaces of node providers lower the barrier for users to access on-chain data, laying the foundation for subsequent data parsing and applications.

2.2 Data Analysis: From Raw Data to Usable Information

The raw data provided by blockchain nodes is usually encrypted and encoded, which ensures the integrity and security of the data, but also increases the difficulty of parsing. For ordinary users and developers, directly handling this data requires a significant amount of technical knowledge and computational resources.

The data parsing process is particularly important in this context. By converting complex raw data into an easily understandable and operable format, users can utilize this information more intuitively. The quality of parsing directly affects the efficiency and effectiveness of blockchain data applications, making it a key link in the entire data indexing process.

Development of Data Indexers 2.3

As the volume of blockchain data surges, the demand for data indexers has become increasingly prominent. Indexers are responsible for organizing on-chain data and importing it into databases for querying. They index blockchain data and provide API interfaces with SQL-like query languages such as GraphQL(, making data readily available. Indexers offer developers a unified querying interface, greatly simplifying the data retrieval process.

Different types of indexers each have their advantages:

  1. Full Node Indexer: Extracts data directly from full nodes, ensuring data integrity, but requires a large amount of storage and processing power.
  2. Lightweight Indexer: Relies on full nodes to obtain specific data, reducing storage requirements but may increase query time.
  3. Dedicated Indexer: Optimized for specific data types or blockchains, such as NFT data or DeFi transactions.
  4. Aggregated Indexer: Extracts data from multiple blockchains and sources, including off-chain information, providing a unified query interface suitable for multi-chain dApps.

Currently, Ethereum archive nodes take up space ranging from 3TB to 13.5TB under different clients. In the face of such a massive amount of data, mainstream indexing protocols not only support multi-chain indexing but also customize data parsing frameworks for different application needs, such as The Graph's "subgraph" framework.

The emergence of indexers has significantly improved data indexing and query efficiency. Compared to traditional RPC endpoints, indexers can efficiently handle large amounts of data and support complex queries and data filtering. Some indexers also support aggregating data from multiple blockchains, avoiding the issue of multi-chain dApps needing to deploy multiple APIs. By operating in a distributed manner, indexers provide stronger security and performance, reducing the risk of interruptions that centralized RPC providers may cause.

![Reading, Indexing to Analyze, Brief Overview of the Web3 Data Indexing Track])https://img-cdn.gateio.im/webp-social/moments-587ce87f6dbedee4acec7d939fed6980.webp(

) 2.4 Full-chain database: Aligning to flow priority

As application requirements become more complex, basic data indexers and their standardized formats struggle to meet the diverse query needs, such as searching, cross-chain access, or off-chain data mapping. Blockchain data service providers are evolving towards building data flows to meet the demands for real-time parsing and comprehensive querying.

Traditional indexer service providers have launched data stream products, such as The Graph's Substreams and Goldsky's Mirror. At the same time, emerging services like Chainbase and SubSquid also offer blockchain-generated real-time data lakes. These services are designed to support application development and assist on-chain data analysis through more advanced data sources.

Revisiting on-chain data through the lens of modern data pipelines, we can envision a future where high-performance datasets can be tailored for any business use case.

3. The Integration of AI and Databases: A Comparative Analysis of The Graph, Chainbase, and Space and Time

3.1 The Graph

The Graph network provides multi-chain data indexing and query services through decentralized nodes, facilitating developers in building decentralized applications. Its core product model includes a data query execution market and a data indexing cache market, catering to users' query needs.

The network consists of four roles: indexers, curators, delegators, and developers, ensuring the system operates through economic incentives. Indexers provide indexing and query services, delegators support the operation of index nodes, curators select valuable subgraphs, and developers are the primary users.

The Graph ecosystem is actively embracing AI technology. Tools such as AutoAgora, Allocation Optimizer, and AgentC developed by Semiotic Labs optimize index pricing, resource allocation, and user query experience, enhancing the system's intelligence and user-friendliness.

![Reading, Indexing to Analyzing, Overview of Web3 Data Indexing Track]###https://img-cdn.gateio.im/webp-social/moments-cf9a002b9b094fbbe3be7f611001b5c1.webp(

) 3.2 Chainbase

Chainbase, as a full-chain data network, integrates data from various blockchains to simplify the process for developers in building and maintaining applications. Its features include:

  • Real-time Data Lake: Provides instant access to blockchain data streams.
  • Dual-chain architecture: The execution layer is built on Eigenlayer AVS, running in parallel with the CometBFT consensus algorithm, enhancing cross-chain data processing capabilities.
  • Innovative data format: Introduce the "manuscripts" standard to optimize the data structure in the cryptocurrency industry.
  • Crypto World Model: Combining AI technology to create a model that can understand and predict blockchain transactions, such as the basic version Theia.

Chainbase's AI model Theia is based on NVIDIA's DORA model, combined with on-chain external data analysis encryption mode, making responses through causal reasoning to deeply explore the value of on-chain data and provide intelligent data services.

![Reading, Indexing to Analysis, Brief Overview of Web3 Data Indexing Track]###https://img-cdn.gateio.im/webp-social/moments-b343cab5112c1a3d52f4e72122ae0df2.webp(

) 3.3 Space and Time

Space and Time ###SxT( is committed to building a verifiable computing layer, extending zero-knowledge proofs on decentralized data warehouses, and providing trusted data processing for smart contracts, large language models, and enterprises.

SxT has introduced the innovative Proof of SQL technology, which is a zero-knowledge proof technique that ensures the SQL query results executed on a decentralized data warehouse are verifiable and tamper-proof. Unlike traditional blockchain networks that rely on consensus mechanisms, SxT obtains data through one node, while other nodes use zk technology to verify the authenticity of the data, improving system performance.

SxT collaborates with Microsoft's AI Lab to develop generative AI tools that simplify the process for users to process blockchain data through natural language. In Space and Time Studio, users can input natural language queries, and the AI automatically converts them to SQL and executes them, presenting the final results.

![Reading, indexing to analysis, a brief overview of the Web3 data indexing track])https://img-cdn.gateio.im/webp-social/moments-97443cbd177ac4ffd1665da670ffbf12.webp(

Conclusion and Outlook

Blockchain data indexing technology has evolved from the initial node data sources, through the development of data parsing and indexing, to finally reach AI-enabled full-chain data services, undergoing a process of gradual improvement. These technological advancements not only enhance data access efficiency and accuracy but also provide users with an intelligent experience.

In the future, with the development of new technologies such as AI and zero-knowledge proofs, blockchain data services will become further intelligent and secure. As an infrastructure, blockchain data services will continue to play an important role in industry innovation.

![Reading, indexing to analysis, a brief overview of the Web3 data indexing track])https://img-cdn.gateio.im/webp-social/moments-0742180b7da8a9dcddafc465a4dba9cb.webp(

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 4
  • Repost
  • Share
Comment
0/400
VCsSuckMyLiquidityvip
· 08-10 01:22
Still painting BTC, the data is good data.
View OriginalReply0
TokenTaxonomistvip
· 08-10 01:12
*adjusts spreadsheet glasses*

statistically speaking, 94.3% of these ai-web3 hybrids are just evolutionary dead-ends
Reply0
PaperHandSistervip
· 08-10 01:05
You can't have both fish and bear's paw! AI can't play with data, I have already bought at the top.
View OriginalReply0
MoonBoi42vip
· 08-10 01:04
Data is good to discuss, let's focus on AI first.
View OriginalReply0
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)