Query Routing

Unlike other decentralized LLM systems that require all network participants to compute and provide results for every user query, our protocol focuses on optimizing three key parameters: Uptime, Speed, and Capital Staked. In a distributed LLM network, we address the query routing challenge with the following objectives:

Inference Quality: Ensure responses to user queries meet a defined standard, providing detailed, unmoderated, and insightful content.
Scalability: Maintain the system’s capability to handle a high volume of queries efficiently.
Computational Efficiency: Avoid unnecessary duplication of computations across all nodes.
Incentivization: Properly reward node operators to encourage participation and high performance.
Competitive Performance: Node operators compete on uptime, response speed, and capital staked, ensuring robust and fair distribution of queries.

Routing and Weighting Procedure

Our routing model monitors global network parameters from the previous epoch and calculates updated weights during the query assignment process. The procedure works as follows:

1. Initial Weights: Assign starting weights to each parameter, determining their influence on the likelihood of a node being selected for queries.

2. Observation and Comparison: At the end of each epoch, the network tracks the performance parameters and tokens staked by each node. These values are then compared to the previous epoch to measure improvement or decline.

3. Adjust Weights:

For performance parameters (uptime, speed):
If the median value increases, the weight remains unchanged.
If the median value decreases, the weight is increased to incentivize nodes to improve performance.
For staked capital, a separate adjustment mechanism is applied to maintain fairness and decentralization.

4. Smoothing Mechanism: To prevent sudden and drastic weight fluctuations, we apply a moving average smoothing technique. Afterwards, weights are normalized so their total equals 1.

5. Allocation Scores: Each node is assigned an allocation score based on epoch weights and normalized parameters for uptime and inference speed. This score is then adjusted according to the capital staked by the node operator. Minimum and maximum thresholds are applied to prevent “nothing at stake” attacks and mitigate centralization risks.

This approach ensures that the distributed LLM network remains efficient, competitive, and resilient, while rewarding high-performing nodes and maintaining decentralization.

// QUANTWARE //

PreviousBilling (on v2)NextLLM at Scale - Efficient, Fast and Incentivized LLM at Scale

Last updated 1 month ago