# Paper reading (55):dynamic multi robot task allocation under uncertainty and temporary constraints

Inge2022-06-23 18:04:31

# 1 introduce

## 1.1 subject

2022： Dynamic multi robot task allocation under uncertainty and time constraints (Dynamic multi-robot task allocation under uncertainty and temporal constraints)

## 1.2 summary

The problem of dynamically assigning tasks to multiple agents under time constraints and task completion uncertainties is considered . Mission The goal is to minimize the number of unsuccessful tasks at the end of the operation . Put forward a kind of Multi robot assignment algorithm , The algorithm decouples the uncertainty from the computational difficulties in the sequential decision-making under Multi-Agent Coordination , And pass Hierarchical mode solve . The lower Use a Dynamic programming of tree search Calculate the policy of a single agent , The upper Resolve conflicts in a single plan to achieve It works Multi agent Distribute . The proposed Random conflict based assignment (Stochastic conflict-based allocation, SCoBA) Will be done under reasonable assumptions and is expected to be optimal . In practice ,SCoBA Of Efficient Computation can meet the requirements of online interleaving planning and execution .
Validation data set Including the picking and placing of Dobby conveyor belts in cities , And multi UAV delivery scheduling . The results are shown in the task success rate ,SCoBA Always better than many baseline methods , And show that compared with Oracle Strong competitive power . It can scale well with the number of tasks and agents .

## 1.4 Bib

@article{
Choudhury:2022:231247,
author = {
Shushman Choudhury and Jayesh K Gupta and Mykel J Kochenderfer and Dorsa Sadigh and Jeannette Bohg},
title = {
Dynamic multi-robot task allocation under uncertainty and temporal constraints},
journal = {
Autonomous Robots},
volume = {
46},
number = {
1},
pages = {
231--247},
year = {
2022},
doi = {
10.1007/s10514-021-10022-9}
}


# 2 Problem statement

existing N N Agents and K K A mission , Write them down as [ N ] [N] and [ K ] [K] . The problem is limited to T T In time steps , For each agent n ∈ [ N ] n\in[N] And tasks k ∈ [ K ] k\in[K] , It serves Time window (Time window) by W n k = ( t n k l , t n k u ) W_{nk}=(t_{nk}^l,t^u_{nk}) , among l l and u u respectively n n Able to handle k k The lower and upper limits of time . If the agent successfully performs the task , There may be additional Downtime (Downtime), For example, the time when the mechanical arm transfers objects to the dustbin . We will Task duration uncertainty (Task duration uncertainty) Defined as ：
τ n k ( t ) = Prob [ n stay when between Step Long t Inside End become k ] . (1) \tag{1} \tau_{nk}(t)=\text{Prob}[n In the time step t Finish in k]. We assume that this cumulative distribution of knowledge is part of the problem statement , Especially task scheduling under uncertainty , At the same time, it is assumed that the specific model is domain related . W W and τ \tau Jointly revealed the task completion rate of the last ：
Prob [ n End become k ] ≤ τ n k ( t n k u − t n k l ) . (2) \tag{2} \text{Prob}[n complete k]\leq\tau_{nk}(t_{nk}^u-t^l_{nk}). For all unsuccessful tasks , The system will be subject to ∑ k J ( k ) \sum_kJ(k) The unit of punishment . An agent can only handle one task at a time .
We seek a kind of An allocation strategy that minimizes cumulative penalties π \pi , It is a proxy to task mapping and its corresponding processing time , namely π : [ N ] → [ K ] × [ T ] \pi:[N]\to[K]\times[T] . Because the completion rate of the task is uncertain , Simple single allocation is obviously not enough . Of course , The processing time of future tasks depends on the completion of earlier tasks . Final , Our optimization problem is as follows ：
argmin ⁡ π ∈ Π E [ ∑ k ∈ [ K ] 1 [ k ] ⋅ J ( k ) ∣ π ]  s.t.  t ∈ W n k ∀ ( k , t ) ∈ π ( n ) (3) \tag{3} \begin{array}{ll} \underset{\pi \in \Pi}{\operatorname{argmin}} & \mathbb{E}\left[\sum_{k \in[K]} \mathbb{1}[k] \cdot J(k) \mid \pi\right] \\ \text { s.t. } & t \in W_{n k} \forall(k, t) \in \pi(n) \end{array} among 1 [ k ] = 1 \mathbb{1}[k]=1 It means task k k Hang in the air , Π \Pi Is a collection of all possible allocation policies , And the constraint is to limit the task to a reasonable time . In the rest , It will be assumed that J ( k ) = 1 J(k)=1 , namely An unweighted penalty where all tasks are equally important .
The above discrete-time rolling bound formula is quite common and useful . It's because of us Focus on upper level assignments rather than lower level task execution , This requires avoiding the additional complexity of continuous time representation . Basic tasks usually involve time constrained trajectory planning , There are perfect models and methods for this . Besides , We can appropriately interleave planning and execution , And recalculate the allocation policy when new tasks appear .

## 2.1 schematic

Describe two different robot settings to instantiate our formula . First , Consider the manipulator distributed along the conveyor belt , Such as chart 2, There are the following restrictions ：
1） Each robot arm has its own collection box , Used to collect objects picked up from the conveyor belt ;
2） Objects arrive at the conveyor belt from the outside , According to the picking strategy or the characteristics of the gripper , The manipulator takes different time to pick up ;
3） The range of the manipulator is limited , An object will not be picked up until it is out of range ;
4） Objects not picked up by the robot arm will be picked up manually later ;
5） The goal is to pick and place as many objects as possible .

chart 2： N = 3 N=3 , K = 5 K=5 Schematic diagram of mechanical arm picking of ： The conveyor belt is a unit length , The workspace of each manipulator spans 0.3 A unit of , The dotted line indicates the limit range . Given belt speed , Any mechanical arm - The time window of the object pair ≤ 5 \leq5 second .

then , Multi UAV Scheduling Considering package delivery in cities ：
1） The delivery task is generated through an external process requested by the customer ;
2） According to flight conditions , It takes different time for the UAV to move from the product warehouse to the package delivery location ;
3） Requests also have a time window , The drone must wait until the user's time window begins , Then deliver the product to the customer , And late delivery will be punished ;
4） Within a certain time frame , The goal is to minimize the number of late deliveries .

## 2.2 Challenge

The following will enlarge The complexity of solving the problem
1） Classification based on multi robot task assignment , Our problem belongs to ST-SR-TA, That is, a single robot handles a single task within the duration .ST-SR-TA problem yes NP A kind of hard scheduling problem , In particular, resource constrained multi-agent scheduling ;
2） The uncertainty of task execution time further increases the difficulty of solving ;
3） The time window requires the algorithm to consider the space-time task relationship , This makes distribution more difficult ;
4） The inflow of new tasks requires our methods to effectively interleave planning and execution , for example , Reschedule when new tasks arrive .

# 3 Hierarchical multi robot task allocation

Algorithmic challenges include Sequential programming under uncertainty as well as Multi agent coordination . For large settings , The joint multi-agent programming problem is computationally prohibitive , The existing methods usually use approximate programming 、 Approximate optimization , Or simple heuristics . Corresponding to this , The proposed ScoBA Use a two-tier hierarchy to handle this problem ：
1） The lower layer independently determines the best task order for each agent and ignores other agents ;
2） The upper layer resolves potential conflicts in cross agent task allocation , To obtain effective multi robot assignment ;
3） Online staggered planning and execution between the two levels , The coordination graph is used to interact with sparse agents .

## 3.1 The lower ： Single agent policy

Known task set 、 Time window , And the uncertainty distribution of task completeness , The goal is to build an agent's task trial policy tree . Because the time consumption of the task is random , The first possible attempt time of a task depends on the task that was attempted before it . We made a simplified approximation —— The agent tries a task as soon as possible , And observe the results at the end of the window . This approximation folds the time dimension by treating the task as a discrete event rather than a continuous event .
Given a single robot n 1 n_1 And three tasks k 1 , k 2 , k 3 k_1,k_2,k_3 , Policy tree The search process is like chart 3
1） Sort tasks in ascending order according to the start time window ;
2） Scan along the timeline , And update the tree at each event node . The event node includes the start of the window , Or the end time after the task is completed ;
3） For the start window , To introduce into or give up Decision node ;
4） For the end window or end time , Introduce output nodes that indicate success or failure , The output probability depends on the minimum feasible start time of the task attempt .

chart 3： The lower SCoBA Generate a policy tree of valid tasks for a single agent . To be specific , By scanning along the time axis and branching at the beginning or end of the task time window . At the beginning of the window , Two new decision nodes ( The ellipse ) By introducing ： Try ( \leftrightarrow ) Or give up ( ↮ \nleftrightarrow ). At the end of the window or after the end time , Output node ( Rounded rectangle ) Describe success or failure . After the trees are created , Dynamic programming propagates values from leaves to roots . Probability value p = 0.03 p=0.03 p ′ = 0.1 p'=0.1 , as well as p ′ ′ = 0.9 p''=0.9 It just means that the same attempt node ( n 1 k 2 n_1 \leftrightarrow k_2 ) Has three different copies With different output probabilities

The leaves of a binary policy tree contain cumulative penalties along its branches , for example , The penalty for each unsuccessful task is 1. Then use dynamic programming to pass values from the leaf to the root . For a pair of output nodes , We take the value of the parent node ( Expressed as V V ) Set to the expected value of its child nodes ：
V ( parent ) : = p ⋅ V ( Fail ) + ( 1 − p ) V ( Succ ) . (4) \tag{4} V(\text{parent}):=p\cdot V(\text{Fail}) + (1-p)V(\text{Succ}). For a pair of decision nodes , The value of the parent node is set to the minimum value of the child node ：
V ( parent ) : = min ⁡ { V ( child1 ) , V ( child2 ) } . (5) \tag{5} V(\text{parent}):=\min\{ V(\text{child1}),V(\text{child2}) \}. Corresponding to chart 3 in , Yes V ( root ) = min ⁡ { V ( n 1 k 1 ) , V ( n 1 ↮ k 1 ) } V(\text{root})=\min\{V(n_1\leftrightarrow k_1),V(n_1\nleftrightarrow k_1)\} . The generated tree encodes the policy , This policy will delegate to all tasks within the scope of the plan Expected penalty minimization , V ( root ) V(\text{root}) Is the value of expected punishment . By tracking the minimum child node until the first attempt at the node , To get the next task assigned to the agent .

## 3.2 The upper ： Multi agent coordination

The policy tree determines an approximately optimal sequence of task attempts for a single agent . For multiple agents , The search of the tree is independent , This could lead to conflict . And because of our The objective function depends on all agents , Therefore, breaking the relationship directly may lead to poor global allocation . Multi-agent routing algorithms face similar challenges , Agent conflicts between shortest paths must be resolved . Conflict based search is an effective strategy to solve this problem , It decouples single agent path planning from path conflict , Proved to be effective in practice , Without losing optimality .
We use the method of conflict resolution between agents in conflict based search , The upper SCoBA Will search for conflicts between the solutions of a single agent obtained from the lower layer , And then generate Binary constraint tree , Such as chart 4. Two agents n 1 n_1 and n 2 n_2 If the same tasks are assigned within overlapping time windows k k , They are conflicting , namely ( k , t 1 ) ∈ π ( n 1 ) (k,t_1)\in\pi(n_1) ( k , t 2 ) ∈ π ( n 2 ) (k,t_2)\in\pi(n_2) , And t 2 ∈ W n 1 , k t_2\in W_{n_1,k} or t 1 ∈ W n 2 , k t_1\in W_{n_2,k} . An agent's constraint is a task that is excluded from the agent's tree search . Each node of the constraint tree includes ：
1） Set of constraints , Tasks that each agent needs to ignore ;
2） Multi-agent assignment considering all constraints ;
3） Allocated cost .
On SCoBA for , Allocation cost Is the sum of the expected punishment of each agent , Expect punishment Is the value of the root node of the current agent policy tree . The allocation cost is used as the best first search criterion in the constraint tree , Will continue until a conflict free assignment is found .

chart 4： The constraint tree node with assignment conflict generates two child nodes , In conflict proxy ( n 1 n_1 and n 3 n_3 ) And tasks ( k 1 k_1 ) Have corresponding constraints on . The best first search on the constraint tree returns the first high-level node with a conflict free assignment

## 3.3 Random conflict based assignment (SCoBA)

Algorithm 1 Shows how to use the search tree as PLANTREE Of a subroutine SCoBA Algorithm .
That's ok 2–6： The initialization constraint tree is empty , Run for each individual agent PLANTREE The distribution of ;
That's ok 9–14： When expanding advanced nodes , Check the validity of the corresponding allocation . If there is no conflict , Then return this assignment as a solution ; otherwise , For each conflict between two or more agents , Add new child nodes , Where constraints are imposed on the agents involved . The child constraint tree node inherits its parent constraint , And add a constraint for a particular delegate .

consider chart 4 Simple illustrative examples in . The root node has a proxy n 1 n_1 and n 3 n_3 Are assigned to tasks k 1 k_1 . This conflict results in two constraints , One is inherited by each of the two child nodes . The first constraint will k 1 k_1 Exclude from recalculated policy tree search n 1 n_1 . The second constraint pair n 3 n_3 It's the same thing . For each new non root node , The lower tree search is rerun only on the agent for which the constraint was added ( That's ok 18). The generated two child nodes are conflict free , But the value of the left child node with lower allocation cost is 2.6 2.6 , It returns as a solution .
our Problem setting is online and random . However , Under some simplified assumptions , We can build SCoBA The optimality and integrity properties of .

proposition 1
If ：
2） Tree search global execution ;
3） The task completion degree is determined by the end time window , that SCoBA Have expectation optimality , namely SCoBA The expectation that the number of outstanding tasks can be minimized before the global time .

proposition 2
In proposition 1 On the basis of ,SCoBA Is a full , That is, if a legal distribution exists ,SCoBA Will return it .

To satisfy optimality , We use existing results in sequential decision making to show that SCoBA Lower level program of , That is, policy tree generation , So that the expectation for a single agent is optimal . Then it proved SCoBA The upper Multi-Agent Coordination satisfies the multi-agent optimality sufficient condition of inheriting conflict based search . It shows SCoBA How the upper constraint tree has a finite number of nodes . therefore , If there is , The best priority search of the system will find effective solutions .

### 3.3.1 Staggered planning and execution

In order to solve the new task , We will plan and execute online interleaving . For single agent policy tree search , Truncate the search scope according to the calculation requirements . Run the scan in the implementation , Until the time window of the first task starts after the end time of all tasks before it . For Multi-Agent Coordination , Thirdly, based on the real-time calculation constraints, the threshold of the number of upper level conflicts is set . If the threshold is exceeded , Return to the current upper level solution . For agents assigned to the same task , Will break the connection at random , And keep the unassigned agents to be assigned to the new task at the next time step .

### 3.3.2 Coordination diagram

SCoBA Applicable to any agent configuration and inter agent constraints , The coordination graph from multi-agent decision making is further used (CG) To improve efficiency . stay CG in , Each node represents a proxy , Potential directional dependencies between each edge coding agent , In this way, only the connected agents need to coordinate their actions , Or, in our example, assign . The choice of problem coordination diagram depends on the domain , for example , In the case of a conveyor belt , The arms are arranged along the conveyor belt , Their workspaces are mutually exclusive , So the coordination graph is a directed chain from the first arm to the last arm .
CG Structure affects SCoBA The upper Multi-Agent Coordination stage . The absence of an edge between two agents means that their possible sets of tasks do not intersect , That is, they do not have conflicting allocations . therefore , In practice ,SCoBA There is no need to consider the dependencies between all agents . If CG It's directional , As in a conveyor belt , We're going to go along CG Topology sorting of running tree search agent . For any agent , Exclude tasks that have been assigned to their predecessors . By construction , You will eventually get a conflict free allocation ( No child nodes will be generated in the upper constraint tree ). If CG It's undirected , For example, in the field of multi UAV delivery , This sort of topological ordering is not feasible , Conflict may be inevitable . however , If CG There are multiple connected components , Then nodes in different components cannot conflict with each other , So you can run in parallel on each component SCoBA.

thank