在双人零和博弈中(如国际象棋、围棋),胜负是清晰的:我赢你输。但当博弈者增加到三人或更多时,游戏的复杂度发生了质变:联盟、背叛、动态平衡 成为关键要素。
多人博弈(Multiplayer Games)不再是简单的”最大化自己的收益”,而是需要考虑:
- 何时与他人合作?
- 如何防止被针对?
- 何时背叛盟友?
- 如何平衡风险与收益?
本文将深入探讨多人博弈 AI 的核心算法(Paranoid、MaxN、Best Reply)、联盟形成策略,以及在经典多人游戏(三国杀、狼人杀、德州扑克)中的应用。
一、多人博弈的独特挑战
1.1 与双人博弈的根本区别
双人零和博弈:
- 你的收益 = 对手的损失(零和)
- 目标:最大化自己的收益 = 最小化对手的收益
- 算法:Minimax、Alpha-Beta 剪枝
多人非零和博弈:
- 总收益可能不为零
- 存在共同利益(合作)和冲突利益(竞争)
- 需要考虑 社会因素(信任、威胁、承诺)
示例:三人扑克
初始筹码:A=100, B=100, C=100
场景1(针对最强者):
- A 当前领先(150筹码)
- B 和 C 可能联手攻击 A
- A 需要"示弱"避免成为众矢之的
场景2(搭便车):
- A 发起进攻 B
- C 可以"坐山观虎斗"
- 等 A、B 两败俱伤后收割
1.2 核心概念
纳什均衡(Nash Equilibrium):
- 定义:每个玩家的策略都是对其他玩家策略的最佳响应
- 多人博弈中可能存在 多个纳什均衡
帕累托最优(Pareto Optimality):
- 定义:无法在不损害任何人利益的情况下改善某人的状况
- 多人博弈中,纳什均衡不一定是帕累托最优
联盟价值(Coalition Value): \(v(S) = \text{联盟 } S \text{ 能获得的最大总收益}\)
二、搜索算法
2.1 Paranoid 算法
假设:所有其他玩家联合起来对抗你。
适用场景:当你领先时,保守策略。
def paranoid_search(state, depth, max_player):
"""
Paranoid 算法:假设所有对手联盟
Args:
state: 当前游戏状态
depth: 搜索深度
max_player: 当前最大化收益的玩家
"""
if depth == 0 or state.is_terminal():
return state.evaluate(max_player), None
if state.current_player == max_player:
# 最大化自己的收益
max_value = float('-inf')
best_move = None
for move in state.legal_moves():
next_state = state.apply_move(move)
value, _ = paranoid_search(next_state, depth - 1, max_player)
if value > max_value:
max_value = value
best_move = move
return max_value, best_move
else:
# 假设其他玩家联合起来最小化你的收益
min_value = float('inf')
best_move = None
for move in state.legal_moves():
next_state = state.apply_move(move)
value, _ = paranoid_search(next_state, depth - 1, max_player)
if value < min_value:
min_value = value
best_move = move
return min_value, best_move
# 使用示例
class ThreePlayerGame:
def __init__(self):
self.current_player = 0
self.scores = [0, 0, 0]
def evaluate(self, player):
"""评估函数:返回该玩家的得分"""
return self.scores[player]
def legal_moves(self):
return ["move1", "move2", "move3"]
def is_terminal(self):
return max(self.scores) >= 100
# AI 决策
game = ThreePlayerGame()
best_score, best_move = paranoid_search(game, depth=4, max_player=0)
print(f"最佳移动: {best_move}, 预期得分: {best_score}")
优点:
- ✅ 保守策略,适合领先时防守
- ✅ 计算复杂度与双人博弈相同 $O(b^d)$
缺点:
- ❌ 过于悲观,假设所有人都针对你
- ❌ 不适合落后时的进攻策略
2.2 MaxN 算法
假设:每个玩家都独立最大化自己的收益。
适用场景:更符合现实的多人博弈。
def maxn_search(state, depth):
"""
MaxN 算法:每个玩家最大化自己的收益
Returns:
tuple: (评估向量, 最佳移动)
评估向量: [player0_value, player1_value, player2_value, ...]
"""
if depth == 0 or state.is_terminal():
return state.evaluate_all(), None
current_player = state.current_player
best_value = None
best_move = None
for move in state.legal_moves():
next_state = state.apply_move(move)
value_tuple, _ = maxn_search(next_state, depth - 1)
# 当前玩家选择使自己收益最大的移动
if best_value is None or value_tuple[current_player] > best_value[current_player]:
best_value = value_tuple
best_move = move
return best_value, best_move
class ThreePlayerGame:
def evaluate_all(self):
"""返回所有玩家的评估值"""
return tuple(self.scores) # (score0, score1, score2)
def apply_move(self, move):
"""应用移动并返回新状态"""
new_state = copy.deepcopy(self)
# 执行移动逻辑
new_state.current_player = (new_state.current_player + 1) % 3
return new_state
# 使用
game = ThreePlayerGame()
value_tuple, best_move = maxn_search(game, depth=4)
print(f"最佳移动: {best_move}")
print(f"预期结果: Player0={value_tuple[0]}, Player1={value_tuple[1]}, Player2={value_tuple[2]}")
优点:
- ✅ 更符合实际情况
- ✅ 能发现合作与竞争的平衡点
缺点:
- ❌ 复杂度高 $O(b^{d \cdot n})$,$n$ 为玩家数
- ❌ 需要准确的评估函数
2.3 Shallow Pruning(浅层剪枝)
原理:在 MaxN 中进行剪枝优化。
def maxn_with_pruning(state, depth, bounds=None):
"""
MaxN + Shallow Pruning
Args:
bounds: [lower_bound, upper_bound] 对于当前玩家
"""
if depth == 0 or state.is_terminal():
return state.evaluate_all(), None
current_player = state.current_player
if bounds is None:
bounds = [float('-inf'), float('inf')]
best_value = None
best_move = None
for move in state.legal_moves():
next_state = state.apply_move(move)
value_tuple, _ = maxn_with_pruning(next_state, depth - 1, bounds)
if best_value is None or value_tuple[current_player] > best_value[current_player]:
best_value = value_tuple
best_move = move
# 更新下界
bounds[0] = max(bounds[0], value_tuple[current_player])
# 剪枝:如果当前玩家的收益已经超过上界
if bounds[0] >= bounds[1]:
break
return best_value, best_move
2.4 Best Reply Search
思想:计算对其他玩家策略的最佳响应。
class BestReplySearch:
def __init__(self, num_players):
self.num_players = num_players
def find_nash_equilibrium(self, game, max_iterations=100):
"""寻找纳什均衡"""
# 初始化:随机策略
strategies = [self.random_strategy(game) for _ in range(self.num_players)]
for iteration in range(max_iterations):
improved = False
for player in range(self.num_players):
# 固定其他玩家的策略
opponent_strategies = strategies[:player] + strategies[player+1:]
# 计算最佳响应
best_response = self.best_response(game, player, opponent_strategies)
# 如果找到更好的策略,更新
if self.is_better(best_response, strategies[player], player, game):
strategies[player] = best_response
improved = True
# 如果没有改进,达到纳什均衡
if not improved:
break
return strategies
def best_response(self, game, player, opponent_strategies):
"""计算玩家对对手策略的最佳响应"""
best_payoff = float('-inf')
best_strategy = None
# 枚举所有可能的策略
for strategy in self.all_strategies(game, player):
# 计算该策略的期望收益
payoff = game.evaluate_strategy(player, strategy, opponent_strategies)
if payoff > best_payoff:
best_payoff = payoff
best_strategy = strategy
return best_strategy
三、联盟与谈判
3.1 Shapley 值(联盟贡献分配)
原理:根据每个玩家对联盟的贡献分配收益。
\[\phi_i(v) = \sum_{S \subseteq N \setminus \{i\}} \frac{|S|! (n - |S| - 1)!}{n!} [v(S \cup \{i\}) - v(S)]\]from itertools import combinations, permutations
def shapley_value(player, coalition_values, all_players):
"""
计算 Shapley 值
Args:
player: 目标玩家
coalition_values: 字典 {frozenset: value}
all_players: 所有玩家的集合
"""
n = len(all_players)
shapley = 0
# 枚举所有不包含该玩家的联盟
other_players = all_players - {player}
for r in range(len(other_players) + 1):
for subset in combinations(other_players, r):
S = frozenset(subset)
S_with_player = S | {player}
# 计算边际贡献
marginal_contribution = coalition_values[S_with_player] - coalition_values[S]
# 计算权重
weight = (
math.factorial(len(S)) *
math.factorial(n - len(S) - 1) /
math.factorial(n)
)
shapley += weight * marginal_contribution
return shapley
# 示例:三人游戏
coalition_values = {
frozenset(): 0,
frozenset(['A']): 10,
frozenset(['B']): 10,
frozenset(['C']): 10,
frozenset(['A', 'B']): 30, # A+B 合作价值高
frozenset(['A', 'C']): 25,
frozenset(['B', 'C']): 25,
frozenset(['A', 'B', 'C']): 60
}
all_players = frozenset(['A', 'B', 'C'])
for player in ['A', 'B', 'C']:
value = shapley_value(player, coalition_values, all_players)
print(f"{player} 的 Shapley 值: {value:.2f}")
3.2 核心解(Core Solution)
定义:一种收益分配方案,使得任何子联盟都没有动机脱离大联盟。
\[\text{Core} = \left\{ x \in \mathbb{R}^n : \sum_{i \in N} x_i = v(N), \sum_{i \in S} x_i \geq v(S), \forall S \subseteq N \right\}\]from scipy.optimize import linprog
def find_core_allocation(coalition_values, all_players):
"""
寻找核心解(使用线性规划)
"""
n = len(all_players)
players = list(all_players)
# 目标:随机找一个核心解(这里用最小化差异)
c = [1] * n # 最小化总分配
# 约束1:总和 = v(N)
A_eq = [[1] * n]
b_eq = [coalition_values[all_players]]
# 约束2:每个子联盟的分配 >= v(S)
A_ub = []
b_ub = []
for r in range(1, n):
for subset in combinations(players, r):
S = frozenset(subset)
constraint = [-1 if p in subset else 0 for p in players]
A_ub.append(constraint)
b_ub.append(-coalition_values[S])
# 求解
result = linprog(c, A_ub=A_ub, b_ub=b_ub, A_eq=A_eq, b_eq=b_eq)
if result.success:
return dict(zip(players, result.x))
else:
return None # 核心解不存在
# 使用
allocation = find_core_allocation(coalition_values, all_players)
if allocation:
print("核心解分配:")
for player, value in allocation.items():
print(f"{player}: {value:.2f}")
else:
print("核心解不存在(可能导致联盟不稳定)")
3.3 动态联盟形成
class DynamicCoalitionGame:
def __init__(self, players, coalition_values):
self.players = players
self.coalition_values = coalition_values
self.current_coalitions = {frozenset([p]) for p in players} # 初始各自独立
def should_merge(self, coalition1, coalition2):
"""判断两个联盟是否应该合并"""
merged = coalition1 | coalition2
# 计算合并的收益
merged_value = self.coalition_values[merged]
separate_value = self.coalition_values[coalition1] + self.coalition_values[coalition2]
# 如果合并收益更高,且能公平分配
if merged_value > separate_value:
# 检查是否存在核心解
return self.has_fair_allocation(merged)
return False
def has_fair_allocation(self, coalition):
"""检查联盟是否有公平的收益分配"""
# 简化版:使用 Shapley 值判断
shapley_sum = sum([
shapley_value(p, self.coalition_values, coalition)
for p in coalition
])
return abs(shapley_sum - self.coalition_values[coalition]) < 1e-6
def evolve_coalitions(self, rounds=10):
"""联盟动态演化"""
for round_num in range(rounds):
print(f"\n=== Round {round_num + 1} ===")
print(f"当前联盟: {self.current_coalitions}")
# 尝试所有可能的合并
best_merge = None
best_gain = 0
coalitions_list = list(self.current_coalitions)
for i in range(len(coalitions_list)):
for j in range(i + 1, len(coalitions_list)):
c1, c2 = coalitions_list[i], coalitions_list[j]
if self.should_merge(c1, c2):
gain = (
self.coalition_values[c1 | c2] -
self.coalition_values[c1] -
self.coalition_values[c2]
)
if gain > best_gain:
best_gain = gain
best_merge = (c1, c2)
# 执行最佳合并
if best_merge:
c1, c2 = best_merge
self.current_coalitions.remove(c1)
self.current_coalitions.remove(c2)
self.current_coalitions.add(c1 | c2)
print(f"合并: {c1} + {c2} = {c1 | c2}, 增益: {best_gain:.2f}")
else:
print("没有有益的合并,联盟稳定")
break
return self.current_coalitions
# 使用
game = DynamicCoalitionGame(
players=['A', 'B', 'C', 'D'],
coalition_values={
frozenset(): 0,
frozenset(['A']): 10,
frozenset(['B']): 10,
frozenset(['C']): 10,
frozenset(['D']): 10,
frozenset(['A', 'B']): 30,
frozenset(['C', 'D']): 25,
frozenset(['A', 'B', 'C']): 50,
frozenset(['A', 'B', 'D']): 48,
frozenset(['A', 'B', 'C', 'D']): 70
}
)
final_coalitions = game.evolve_coalitions()
print(f"\n最终联盟: {final_coalitions}")
四、实战案例
4.1 三国杀 AI
游戏特点:
- 3-8 人,身份隐藏(主公、忠臣、反贼、内奸)
- 联盟动态变化(反贼联合对抗主公)
- 信息不完全(不知道其他人身份)
class SanguoshaAI:
def __init__(self, my_identity, visible_identities):
self.my_identity = my_identity # '主公', '忠臣', '反贼', '内奸'
self.visible_identities = visible_identities
def evaluate_threat(self, player):
"""评估威胁等级"""
if self.my_identity == '主公':
# 主公视角:反贼 > 内奸 > 忠臣
if player.suspected_identity == '反贼':
return 10
elif player.suspected_identity == '内奸':
return 5
else:
return 0
elif self.my_identity == '反贼':
# 反贼视角:主公 > 忠臣 > 内奸
if player.is_lord:
return 10
elif player.suspected_identity == '忠臣':
return 7
else:
return 3
elif self.my_identity == '内奸':
# 内奸视角:保持平衡,清除威胁最大的
# 前期帮反贼削弱主公,后期收割
if self.game_phase == 'early':
if player.is_lord:
return 5
else:
return player.current_hp # 血量高的威胁大
def choose_target(self, possible_targets):
"""选择攻击目标"""
# 综合考虑:威胁等级、血量、手牌数、距离
scores = []
for target in possible_targets:
score = 0
# 威胁等级
score += self.evaluate_threat(target) * 10
# 血量(优先击杀残血)
if target.hp <= 2:
score += 20
# 防御能力
score -= len(target.armor) * 5
# 合作关系(避免误伤盟友)
if self.is_ally(target):
score -= 100
scores.append((score, target))
return max(scores, key=lambda x: x[0])[1]
def is_ally(self, player):
"""判断是否是盟友"""
if self.my_identity == '主公':
return player.suspected_identity == '忠臣'
elif self.my_identity == '忠臣':
return player.is_lord
elif self.my_identity == '反贼':
return player.suspected_identity == '反贼'
else: # 内奸
return False # 内奸无盟友
4.2 狼人杀 AI
游戏特点:
- 信息极度不对称
- 需要推理、欺骗、说服
- 投票机制
class WerewolfAI:
def __init__(self, role):
self.role = role # '狼人', '村民', '预言家', '女巫'
self.beliefs = {} # 对每个玩家身份的信念
def update_beliefs(self, observations):
"""根据观察更新信念"""
for player, action in observations:
if action == '跳预言家':
# 贝叶斯更新
if player not in self.beliefs:
self.beliefs[player] = {'预言家': 0.5, '狼人': 0.3, '村民': 0.2}
else:
# 如果已经有人跳预言家,这个人可能是狼人
if any(b.get('预言家', 0) > 0.5 for b in self.beliefs.values()):
self.beliefs[player]['狼人'] += 0.3
else:
self.beliefs[player]['预言家'] += 0.3
def vote(self, candidates):
"""投票决策"""
if self.role == '狼人':
# 狼人:投票给威胁最大的神职
threat_scores = []
for player in candidates:
threat = 0
if self.beliefs.get(player, {}).get('预言家', 0) > 0.5:
threat = 10
elif self.beliefs.get(player, {}).get('女巫', 0) > 0.5:
threat = 8
threat_scores.append((threat, player))
return max(threat_scores, key=lambda x: x[0])[1]
else: # 好人
# 投票给最可能是狼人的
wolf_probs = []
for player in candidates:
prob = self.beliefs.get(player, {}).get('狼人', 0)
wolf_probs.append((prob, player))
return max(wolf_probs, key=lambda x: x[0])[1]
def speak(self, phase):
"""发言策略"""
if self.role == '狼人':
return self.wolf_speak_strategy(phase)
elif self.role == '预言家':
return self.seer_speak_strategy(phase)
else:
return self.villager_speak_strategy(phase)
def wolf_speak_strategy(self, phase):
"""狼人发言策略"""
if phase == 'early':
# 前期低调,不暴露身份
return "我是村民,没有特殊信息"
elif phase == 'mid':
# 中期带节奏,质疑真预言家
return "我怀疑 X 号玩家是假预言家"
else:
# 后期反击
return "根据逻辑推理,Y 号玩家一定是狼"
4.3 德州扑克多人策略
class TexasHoldemMultiplayer:
def __init__(self, num_players):
self.num_players = num_players
def compute_nash_equilibrium(self, game_state):
"""计算纳什均衡策略"""
# 使用 Counterfactual Regret Minimization (CFR)
# 简化版:只考虑当前轮
pass
def adjust_for_position(self, strategy, position, num_active):
"""根据位置调整策略"""
# 按钮位置(最后行动):更激进
# 盲注位置(先行动):更保守
if position == num_active - 1: # 按钮位置
strategy['raise_prob'] *= 1.5
strategy['bluff_prob'] *= 1.3
elif position < 2: # 盲注位置
strategy['fold_prob'] *= 1.2
strategy['raise_prob'] *= 0.8
return strategy
def exploit_weaknesses(self, opponent_models):
"""利用对手弱点"""
exploits = []
for opponent in opponent_models:
if opponent['fold_freq'] > 0.7:
# 对手容易弃牌 → 多诈唬
exploits.append(('bluff_more', opponent['id']))
elif opponent['call_freq'] > 0.6:
# 对手爱跟注 → 少诈唬,有牌就下重注
exploits.append(('value_bet', opponent['id']))
return exploits
五、高级技巧
5.1 对手建模(Opponent Modeling)
class OpponentModel:
def __init__(self, player_id):
self.player_id = player_id
self.action_history = []
self.strategy_distribution = None
def update(self, state, action):
"""更新对手模型"""
self.action_history.append((state, action))
# 使用贝叶斯推理更新策略分布
self.strategy_distribution = self.infer_strategy()
def infer_strategy(self):
"""推断对手的策略"""
# 统计对手在不同情况下的行为
strategy = {
'aggressive': 0,
'conservative': 0,
'balanced': 0
}
for state, action in self.action_history:
if action.is_aggressive:
strategy['aggressive'] += 1
elif action.is_conservative:
strategy['conservative'] += 1
else:
strategy['balanced'] += 1
# 归一化
total = sum(strategy.values())
return {k: v / total for k, v in strategy.items()}
def predict_action(self, state):
"""预测对手的行为"""
if self.strategy_distribution['aggressive'] > 0.5:
return self.simulate_aggressive_action(state)
elif self.strategy_distribution['conservative'] > 0.5:
return self.simulate_conservative_action(state)
else:
return self.simulate_balanced_action(state)
5.2 元博弈(Meta-Game)
class MetaGameAI:
"""考虑对手也在建模你的 AI"""
def __init__(self):
self.level = 2 # 思考层级:我知道你知道我知道...
def level_k_reasoning(self, state, k):
"""k 层推理"""
if k == 0:
# Level-0: 随机或简单策略
return random.choice(state.legal_moves())
# Level-k: 假设对手是 Level-(k-1)
opponent_action = self.level_k_reasoning(state, k - 1)
# 找到对 Level-(k-1) 对手的最佳响应
best_response = self.best_response_to(state, opponent_action)
return best_response
def mixed_strategy(self, pure_strategies, weights):
"""混合策略(防止被预测)"""
return random.choices(pure_strategies, weights=weights)[0]
六、总结
多人博弈 AI 设计要点:
- 算法选择:
- 领先时:Paranoid(防守)
- 均势时:MaxN(平衡)
- 落后时:进攻策略 + 联盟
- 联盟策略:
- 使用 Shapley 值评估贡献
- 寻找核心解保证稳定
- 动态调整联盟
- 信息处理:
- 对手建模(贝叶斯更新)
- 信息隐藏(混合策略)
- 欺骗与识破
- 实战技巧:
- 位置优势利用
- 利用对手弱点
- 元博弈思维
推荐学习路径:
- 实现 Paranoid 和 MaxN
- 学习博弈论基础(纳什均衡、Shapley 值)
- 实践简单的多人游戏(如三人井字棋)
- 进阶到不完全信息游戏(狼人杀、德州扑克)
参考资源
论文:
- The Complexity of Computing a Nash Equilibrium
- Paranoid vs. Max^n
- Coalition Formation in Multi-Agent Systems
书籍:
- 《Game Theory: Analysis of Conflict》- Roger Myerson
- 《Multiagent Systems》- Yoav Shoham & Kevin Leyton-Brown
代码:
游戏:
- 三国杀 - 隐藏身份 + 联盟
- 狼人杀 - 推理 + 欺骗
- 卡坦岛 - 资源谈判 + 联盟
- Diplomacy - 纯外交博弈
💬 交流与讨论
⚠️ 尚未完成 Giscus 配置。请在
_config.yml中设置repo_id与category_id后重新部署,即可启用升级后的评论系统。配置完成后,评论区将自动支持 Markdown 代码高亮与 LaTeX 数学公式渲染,访客回复会同步到 GitHub Discussions,并具备通知功能。