Reducing atomic reference counting The final optimization we will discuss in this article is how the new scheduler reduces the amount of atomic reference counts needed. There are many outstanding references to the task structure: the scheduler and each waker hold a handle. A common way to manage this memory is to use atomic reference counting. This strategy requires an atomic operation each time a reference is cloned and an atomic operation each time a reference is dropped. When the final reference goes out of scope, the memory is freed. In the old Tokio scheduler, each waker held a counted reference to the task handle, roughly: struct Waker { task: Arc<Task>, } impl Waker { fn wake(&self) { let task = self.task.clone(); task.scheduler.schedule(task); } } When the task is woken, the reference is cloned (atomic increment). The reference is then pushed into the run queue. When the processor receives the task and is done executing it, it drops the reference resulting in an atomic decrement. These atomic operations add up. This problem has previously been identified by the designers of the std::future task system. It was observed that when Waker::wake is called, often times the original waker reference is no longer needed. This allows for reusing the atomic reference count when pushing the task into the run queue. The std::future task system now includes two "wake" APIs: wake which takes self wake_by_ref which takes &self. This API design pushes the caller to use wake which avoids the atomic increment. The implementation now becomes: impl Waker { fn wake(self) { task.scheduler.schedule(self.task); } fn wake_by_ref(&self) { let task = self.task.clone(); task.scheduler.schedule(task); } } This avoids the overhead of additional reference counting only if it is possible to take ownership of the waker in order to wake. In my experience, it is almost always desirable to wake with &self instead. Waking with self prevents reusing the waker (useful in cases where the resource sends many values, i.e. channels, sockets, ...) it also is more difficult to implement thread-safe waking when self is required (the details of this will be left to another article). The new scheduler side steps the entire "wake by self" issue by avoiding the atomic increment in wake_by_ref, making it as efficient as wake(self). This is done by having the scheduler maintain a list of all tasks currently active (have not yet completed). This list represents the reference count needed to push a task into the run queue. The difficulty with this optimization is to ensure that the scheduler will not drop any tasks from its list until it can be guaranteed that the task will not be pushed into the run queue again. The specifics of how this is managed are beyond the scope of this article, but I urge you to further investigate this in the source.
在新调度器中,为了减少原子引用计数的使用,进行了以下优化。通常,调度器和每个唤醒器都持有任务结构的一个句柄,常见的管理这些内存的方式是使用原子引用计数。这种策略需要在每次克隆引用时进行一次原子操作,并在每次丢弃引用时进行一次原子操作。当最后一个引用超出作用域时,内存被释放。
在旧的Tokio调度器中,每个唤醒器持有一个对任务句柄的计数引用,大致如下:
```rust struct Waker { task: Arc<Task>, }
impl Waker { fn wake(&self) { let task = self.task.clone(); task.scheduler.schedule(task); } } ```
当任务被唤醒时,引用被克隆(原子增加)。然后将引用推入运行队列。当处理器接收到任务并执行完毕后,它会丢弃引用,导致原子减少。这些原子操作累计起来。
std::future任务系统的设计者之前已经识别出这个问题。他们观察到,当调用Waker::wake时,往往不再需要原始的唤醒器引用。这允许在将任务推入运行队列时重用原子引用计数。std::future任务系统现在包括两个“唤醒”API:
wake,这个方法接收selfwake_by_ref,这个方法接收&self
这种API设计促使调用者使用避免原子增量的wake方法。现在的实现变为:
```rust impl Waker { fn wake(self) { self.task.scheduler.schedule(self.task); }
fn wake_by_ref(&self) {
let task = self.task.clone();
task.scheduler.schedule(task);
}
} ```
这种方式只有在可以拿到唤醒器的所有权以唤醒时,才能避免额外引用计数的开销。根据我的经验,几乎总是更希望使用&self来唤醒。使用self唤醒会阻止重用唤醒器(在需要发送许多值的情况下有用,例如通道、套接字等),实现线程安全的唤醒也更加困难(当需要self时)。
新调度器通过避免在wake_by_ref中进行原子增量,使其与wake(self)一样高效,从而绕过了整个“自我唤醒”的问题。这是通过让调度器维护一个所有当前活跃(尚未完成)任务的列表来实现的。这个列表代表了将任务推入运行队列所需的引用计数。
这种优化的难点在于确保调度器不会从其列表中删除任何任务,直到可以保证任务不会再次被推入运行队列。如何管理这一点的具体细节超出了本文的范围,但我鼓励你在源代码中进一步研究这个问题。