GSoC 2024

Simpler Scheduler Design

mye280c37 • Wed Jul 03 2024

#ns-3#nr#QoS#scheduling#RL

Goal of the Scheduler

Minimize the total delay of the gNB while considering the priority of each attached UE

As analyzed before, one of the issues in the QoS scheduler is the unfairness of delay between DC-GBR and other types of traffic
To address this issue, the designed scheduler focuses on minimizing the total delay

In Terms of Usability

The scheduler’s design should be simple and easily understandable to facilitate ease of reuse

State (Input)

\begin{aligned} X= \begin{bmatrix} \overrightarrow{x_1} & \overrightarrow{x_2} & \cdots & \overrightarrow{x_n} \end{bmatrix} \\ \text{,where } \overrightarrow{x_i} = [\text{RNTI}_i, P_i, \text{HOL}_i] \end{aligned}

Each element of $i$ th UE’s state vector $\overrightarrow{x_i}$ is defined as follows:

$P_i$ : The Default Level of Priority
$\text{HOL}_i$ : Head of Line Delay
$\text{RNTI}_i$ : Radio Network Temporary Identifier

Action (Output)

$W_t$ is the weights vector of $n$ UEs, each element $w_i$ indicates the weight of the $i$ th UE

W_t=[w_1, w_2, \cdots, w_n]

Reward

The reward $r_i$ is received for each UE’s weight $w_i$ .

Option 1

r_i= -(100-P_i)\times\text{HOL}_i

✅ Advantages

Clear Penalty: Provides a clear penalty for high delay with higher prority, encouraging the network to optimize performance.
Intuitive Interpretation: Reflects that high delay with higher priority are undesirable, guiding the learning process towards performance optimization.

❌ Disadvantages

Complexity of Negative Values: Negative rewards can make convergence difficult for some algorithms, complicating policy evaluation and updates.
Impact of Scale: Large negative values can slow down or destabilize learning

Option 2

r_i= \cfrac{1}{(100-P_i)\times\text{HOL}_i}

✅ Advantages

Limited Range of Values: Rewards fall between 0 and 1, which can lead to more stable learning. This method rewards positive network states and actions.
Positive Reinforcement: All rewards are positive, promoting better actions through positive reinforcement.

❌ Disadvantages

Non-linearity: Rewards change non-linearly, making them harder to interpret and less intuitive, especially in early learning stages.
Impact of Small Values: Rewards can become very small, slowing down learning and making the system highly sensitive to minor changes.

Total Reward

The total reward $R(X_t, W_t)$ of a gNB with the current input $X_t$ and output $W_t$ in slot time $t$ is as follows:

R(X_t,W_t) = \sum_{i=1}^n r_i \\

← Back to Blog