"); //-->
本文分享自天翼云开发者社区《python 实现消费者优先级队列》,作者:Frost
关键字
条件变量,信号量,消费者优先级,公平性,堆队列算法
需求背景
常见的是消息队列支持为消息指定优先级,但支持为消费者指定优先级的却很少见,作者在网上检索一般能查到 rabbitMQ 的消费者优先级相关资料。并没有找到其它语言的资料。
而 python 标准库里所有队列都是公平的,并没有使用非公平的参数,因此可能不能满足有些场景的需求。
什么是公平与非公平呢,这个一般是指互斥锁的特征,互斥锁的多个尝试取锁的线程其实很类似队列的多个消费者,以 waiter 统称。
假设有 A, B, C, D 四个 waiter,他们按照字母顺序依次调用 acquire()/get(),
那么等到有线程释放锁或队列放入了一条消息,会按照先来后到的顺序,唤醒对应的 waiter,也就是这里的 A,同理,按照排队顺序,B -> C -> D 将是后续的唤醒顺序,其实简单讲就是 FIFO。
一般来说 FIFO 策略具有普适性,可以避免有的消费者被饿死,但某些场景我们希望给队列的消费者赋予优先级,每次优先唤醒仍在等待消费的优先级最高的消费者。
下面会给出 pure python 的实现。
实现原理
先阅读 python 自带的 SimpleQueue 源码 (pure python 版本,位于 Lib\queue.py)。
class _PySimpleQueue: '''Simple, unbounded FIFO queue. This pure Python implementation is not reentrant. ''' # Note: while this pure Python version provides fairness # (by using a threading.Semaphore which is itself fair, being based # on threading.Condition), fairness is not part of the API contract. # This allows the C version to use a different implementation. def __init__(self): self._queue = deque() self._count = threading.Semaphore(0) def put(self, item, block=True, timeout=None): '''Put the item on the queue. The optional 'block' and 'timeout' arguments are ignored, as this method never blocks. They are provided for compatibility with the Queue class. ''' self._queue.append(item) self._count.release() def get(self, block=True, timeout=None): '''Remove and return an item from the queue. If optional args 'block' is true and 'timeout' is None (the default), block if necessary until an item is available. If 'timeout' is a non-negative number, it blocks at most 'timeout' seconds and raises the Empty exception if no item was available within that time. Otherwise ('block' is false), return an item if one is immediately available, else raise the Empty exception ('timeout' is ignored in that case). ''' if timeout is not None and timeout < 0: raise ValueError("'timeout' must be a non-negative number") if not self._count.acquire(block, timeout): raise Empty return self._queue.popleft() def put_nowait(self, item): '''Put an item into the queue without blocking. This is exactly equivalent to `put(item, block=False)` and is only provided for compatibility with the Queue class. ''' return self.put(item, block=False) def get_nowait(self): '''Remove and return an item from the queue without blocking. Only get an item if one is immediately available. Otherwise raise the Empty exception. ''' return self.get(block=False) def empty(self): '''Return True if the queue is empty, False otherwise (not reliable!).''' return len(self._queue) == 0 def qsize(self): '''Return the approximate size of the queue (not reliable!).''' return len(self._queue) __class_getitem__ = classmethod(types.GenericAlias)
docstring 里面说明,这个队列是保证了公平性,因为其使用的信号量实现是公平的。
符合直觉的是,我们在 get 方法以及信号量的 acquire 方法增加一个优先级数值的参数,那么再来看信号量的实现,看看能不能做到这一点,
class Semaphore: """This class implements semaphore objects. Semaphores manage a counter representing the number of release() calls minus the number of acquire() calls, plus an initial value. The acquire() method blocks if necessary until it can return without making the counter negative. If not given, value defaults to 1. """ # After Tim Peters' semaphore class, but not quite the same (no maximum) def __init__(self, value=1): if value < 0: raise ValueError("semaphore initial value must be >= 0") self._cond = Condition(Lock()) self._value = value def acquire(self, blocking=True, timeout=None): """Acquire a semaphore, decrementing the internal counter by one. When invoked without arguments: if the internal counter is larger than zero on entry, decrement it by one and return immediately. If it is zero on entry, block, waiting until some other thread has called release() to make it larger than zero. This is done with proper interlocking so that if multiple acquire() calls are blocked, release() will wake exactly one of them up. The implementation may pick one at random, so the order in which blocked threads are awakened should not be relied on. There is no return value in this case. When invoked with blocking set to true, do the same thing as when called without arguments, and return true. When invoked with blocking set to false, do not block. If a call without an argument would block, return false immediately; otherwise, do the same thing as when called without arguments, and return true. When invoked with a timeout other than None, it will block for at most timeout seconds. If acquire does not complete successfully in that interval, return false. Return true otherwise. """ if not blocking and timeout is not None: raise ValueError("can't specify timeout for non-blocking acquire") rc = False endtime = None with self._cond: while self._value == 0: if not blocking: break if timeout is not None: if endtime is None: endtime = _time() + timeout else: timeout = endtime - _time() if timeout <= 0: break self._cond.wait(timeout) else: self._value -= 1 rc = True return rc __enter__ = acquire def release(self, n=1): """Release a semaphore, incrementing the internal counter by one or more. When the counter is zero on entry and another thread is waiting for it to become larger than zero again, wake up that thread. """ if n < 1: raise ValueError('n must be one or more') with self._cond: self._value += n for i in range(n): self._cond.notify() def __exit__(self, t, v, tb): self.release()
可以看到信号量其实使用条件变量实现的,分析可知只能在条件变量的 wait 方法增加优先级数值。下面只展示不完整的,只有关键部分的源码
class Condition: """Class that implements a condition variable. A condition variable allows one or more threads to wait until they are notified by another thread. If the lock argument is given and not None, it must be a Lock or RLock object, and it is used as the underlying lock. Otherwise, a new RLock object is created and used as the underlying lock. """ def __init__(self, lock=None): if lock is None: lock = RLock() self._lock = lock # Export the lock's acquire() and release() methods self.acquire = lock.acquire self.release = lock.release # If the lock defines _release_save() and/or _acquire_restore(), # these override the default implementations (which just call # release() and acquire() on the lock). Ditto for _is_owned(). try: self._release_save = lock._release_save except AttributeError: pass try: self._acquire_restore = lock._acquire_restore except AttributeError: pass try: self._is_owned = lock._is_owned except AttributeError: pass self._waiters = _deque() def _at_fork_reinit(self):... def __enter__(self):... def __exit__(self, *args):... def __repr__(self):... def _release_save(self):... def _acquire_restore(self, x):... def _is_owned(self):... def wait(self, timeout=None): """Wait until notified or until a timeout occurs. If the calling thread has not acquired the lock when this method is called, a RuntimeError is raised. This method releases the underlying lock, and then blocks until it is awakened by a notify() or notify_all() call for the same condition variable in another thread, or until the optional timeout occurs. Once awakened or timed out, it re-acquires the lock and returns. When the timeout argument is present and not None, it should be a floating point number specifying a timeout for the operation in seconds (or fractions thereof). When the underlying lock is an RLock, it is not released using its release() method, since this may not actually unlock the lock when it was acquired multiple times recursively. Instead, an internal interface of the RLock class is used, which really unlocks it even when it has been recursively acquired several times. Another internal interface is then used to restore the recursion level when the lock is reacquired. """ if not self._is_owned(): raise RuntimeError("cannot wait on un-acquired lock") waiter = _allocate_lock() waiter.acquire() self._waiters.append(waiter) saved_state = self._release_save() gotit = False try: # restore state no matter what (e.g., KeyboardInterrupt) if timeout is None: waiter.acquire() gotit = True else: if timeout > 0: gotit = waiter.acquire(True, timeout) else: gotit = waiter.acquire(False) return gotit finally: self._acquire_restore(saved_state) if not gotit: try: self._waiters.remove(waiter) except ValueError: pass def wait_for(self, predicate, timeout=None):... def notify(self, n=1): """Wake up one or more threads waiting on this condition, if any. If the calling thread has not acquired the lock when this method is called, a RuntimeError is raised. This method wakes up at most n of the threads waiting for the condition variable; it is a no-op if no threads are waiting. """ if not self._is_owned(): raise RuntimeError("cannot notify on un-acquired lock") waiters = self._waiters while waiters and n > 0: waiter = waiters[0] try: waiter.release() except RuntimeError: # gh-92530: The previous call of notify() released the lock, # but was interrupted before removing it from the queue. # It can happen if a signal handler raises an exception, # like CTRL+C which raises KeyboardInterrupt. pass else: n -= 1 try: waiters.remove(waiter) except ValueError: pass def notify_all(self):... def notifyAll(self):...
容易看出,关键部分在于 notify 方法,这里决定了通知哪一个 waiter,每个 waiter 对应队列的一个消费者。代码第 106 行的 waiters[0] 完美说明了 FIFO 原则,那么我们只需要改动这个 self._waiters 这个双向队列内 waiter 插入/获取的策略就行了。
可以使用标准库内 heapq 这个包来实现,也就是使用堆队列算法。另外 wait 方法要增加一个 priority 参数。具体实现如下:
import heapqimport itertoolsfrom dataclasses import dataclass, fieldfrom typing import Any, Tuple@dataclass(order=True)class PrioritizedWaiter: priority: Tuple waiter: Any=field(compare=False)class Condition: """Class that implements a condition variable. A condition variable allows one or more threads to wait until they are notified by another thread. If the lock argument is given and not None, it must be a Lock or RLock object, and it is used as the underlying lock. Otherwise, a new RLock object is created and used as the underlying lock. """ def __init__(self, lock=None): if lock is None: lock = RLock() self._lock = lock # Export the lock's acquire() and release() methods self.acquire = lock.acquire self.release = lock.release # If the lock defines _release_save() and/or _acquire_restore(), # these override the default implementations (which just call # release() and acquire() on the lock). Ditto for _is_owned(). try: self._release_save = lock._release_save except AttributeError: pass try: self._acquire_restore = lock._acquire_restore except AttributeError: pass try: self._is_owned = lock._is_owned except AttributeError: pass self._waiters = [] self._counter = itertools.count() def _at_fork_reinit(self):... def __enter__(self):... def __exit__(self, *args):... def __repr__(self):... def _release_save(self):... def _acquire_restore(self, x):... def _is_owned(self):... def wait(self, timeout=None, priority=10): """Wait until notified or until a timeout occurs. If the calling thread has not acquired the lock when this method is called, a RuntimeError is raised. This method releases the underlying lock, and then blocks until it is awakened by a notify() or notify_all() call for the same condition variable in another thread, or until the optional timeout occurs. Once awakened or timed out, it re-acquires the lock and returns. When the timeout argument is present and not None, it should be a floating point number specifying a timeout for the operation in seconds (or fractions thereof). When the underlying lock is an RLock, it is not released using its release() method, since this may not actually unlock the lock when it was acquired multiple times recursively. Instead, an internal interface of the RLock class is used, which really unlocks it even when it has been recursively acquired several times. Another internal interface is then used to restore the recursion level when the lock is reacquired. """ if not self._is_owned(): raise RuntimeError("cannot wait on un-acquired lock") waiter = _allocate_lock() waiter.acquire() item = PrioritizedWaiter((priority, next(self._counter), waiter) heapq.heappush(self._waiters, item) saved_state = self._release_save() gotit = False try: # restore state no matter what (e.g., KeyboardInterrupt) if timeout is None: waiter.acquire() gotit = True else: if timeout > 0: gotit = waiter.acquire(True, timeout) else: gotit = waiter.acquire(False) return gotit finally: self._acquire_restore(saved_state) if not gotit: try: self._waiters.remove(item) except ValueError: pass else: heapq.heapify(self._waiters) def wait_for(self, predicate, timeout=None):... def notify(self, n=1): """Wake up one or more threads waiting on this condition, if any. If the calling thread has not acquired the lock when this method is called, a RuntimeError is raised. This method wakes up at most n of the threads waiting for the condition variable; it is a no-op if no threads are waiting. """ if not self._is_owned(): raise RuntimeError("cannot notify on un-acquired lock") waiters = self._waiters while waiters and n > 0: waiter = waiters[0].waiter try: waiter.release() except RuntimeError: # gh-92530: The previous call of notify() released the lock, # but was interrupted before removing it from the queue. # It can happen if a signal handler raises an exception, # like CTRL+C which raises KeyboardInterrupt. pass else: n -= 1 try: heapq.heappop(waiters) except ValueError: pass def notify_all(self):... def notifyAll(self):...
由此,最关键的实现完成了,接下来只需要给 _PySimpleQueue.get 方法也增加 priority 参数,并传入 Semaphore.acquire 方法。 Semaphore.acquire 方法增加 priority 参数,并传入给 Condition.wait 方法,就完成啦,限于篇幅这里就不全写下来了。
另外这里虽然加入了 priority 参数,但完全不使用这个参数时,其行为和原始版本时没有区别的,即依然符合 FIFO 策略。
*博客内容为网友个人发布,仅代表博主个人观点,如有侵权请联系工作人员删除。