Goal of the Scheduler
Minimize the total delay of the gNB while considering the priority of each attached UE
- As analyzed before, one of the issues in the QoS scheduler is the unfairness of delay between DC-GBR and other types of traffic
- To address this issue, the designed scheduler focuses on minimizing the total delay
In Terms of Usability
The scheduler’s design should be simple and easily understandable to facilitate ease of reuse
State (Input)
Each element of th UE’s state vector is defined as follows:
- : The Default Level of Priority
- : Head of Line Delay
- : Radio Network Temporary Identifier
Action (Output)
is the weights vector of UEs, each element indicates the weight of the th UE
Reward
The reward is received for each UE’s weight .
Option 1
✅ Advantages
- Clear Penalty: Provides a clear penalty for high delay with higher prority, encouraging the network to optimize performance.
- Intuitive Interpretation: Reflects that high delay with higher priority are undesirable, guiding the learning process towards performance optimization.
❌ Disadvantages
- Complexity of Negative Values: Negative rewards can make convergence difficult for some algorithms, complicating policy evaluation and updates.
- Impact of Scale: Large negative values can slow down or destabilize learning
Option 2
✅ Advantages
- Limited Range of Values: Rewards fall between 0 and 1, which can lead to more stable learning. This method rewards positive network states and actions.
- Positive Reinforcement: All rewards are positive, promoting better actions through positive reinforcement.
❌ Disadvantages
- Non-linearity: Rewards change non-linearly, making them harder to interpret and less intuitive, especially in early learning stages.
- Impact of Small Values: Rewards can become very small, slowing down learning and making the system highly sensitive to minor changes.
Total Reward
The total reward of a gNB with the current input and output in slot time is as follows: