多人博弈AI设计

三人以上博弈的策略与联盟

Posted by Feng Yu on September 3, 2024

在双人零和博弈中(如国际象棋、围棋),胜负是清晰的:我赢你输。但当博弈者增加到三人或更多时,游戏的复杂度发生了质变:联盟、背叛、动态平衡 成为关键要素。

多人博弈(Multiplayer Games)不再是简单的”最大化自己的收益”,而是需要考虑:

  • 何时与他人合作?
  • 如何防止被针对?
  • 何时背叛盟友?
  • 如何平衡风险与收益?

本文将深入探讨多人博弈 AI 的核心算法(Paranoid、MaxN、Best Reply)、联盟形成策略,以及在经典多人游戏(三国杀、狼人杀、德州扑克)中的应用。


一、多人博弈的独特挑战

1.1 与双人博弈的根本区别

双人零和博弈

  • 你的收益 = 对手的损失(零和)
  • 目标:最大化自己的收益 = 最小化对手的收益
  • 算法:Minimax、Alpha-Beta 剪枝

多人非零和博弈

  • 总收益可能不为零
  • 存在共同利益(合作)和冲突利益(竞争)
  • 需要考虑 社会因素(信任、威胁、承诺)

示例:三人扑克

初始筹码:A=100, B=100, C=100

场景1(针对最强者):
- A 当前领先(150筹码)
- B 和 C 可能联手攻击 A
- A 需要"示弱"避免成为众矢之的

场景2(搭便车):
- A 发起进攻 B
- C 可以"坐山观虎斗"
- 等 A、B 两败俱伤后收割

1.2 核心概念

纳什均衡(Nash Equilibrium)

  • 定义:每个玩家的策略都是对其他玩家策略的最佳响应
  • 多人博弈中可能存在 多个纳什均衡

帕累托最优(Pareto Optimality)

  • 定义:无法在不损害任何人利益的情况下改善某人的状况
  • 多人博弈中,纳什均衡不一定是帕累托最优

联盟价值(Coalition Value): \(v(S) = \text{联盟 } S \text{ 能获得的最大总收益}\)


二、搜索算法

2.1 Paranoid 算法

假设:所有其他玩家联合起来对抗你。

适用场景:当你领先时,保守策略。

def paranoid_search(state, depth, max_player):
    """
    Paranoid 算法:假设所有对手联盟
    
    Args:
        state: 当前游戏状态
        depth: 搜索深度
        max_player: 当前最大化收益的玩家
    """
    if depth == 0 or state.is_terminal():
        return state.evaluate(max_player), None
    
    if state.current_player == max_player:
        # 最大化自己的收益
        max_value = float('-inf')
        best_move = None
        
        for move in state.legal_moves():
            next_state = state.apply_move(move)
            value, _ = paranoid_search(next_state, depth - 1, max_player)
            
            if value > max_value:
                max_value = value
                best_move = move
        
        return max_value, best_move
    else:
        # 假设其他玩家联合起来最小化你的收益
        min_value = float('inf')
        best_move = None
        
        for move in state.legal_moves():
            next_state = state.apply_move(move)
            value, _ = paranoid_search(next_state, depth - 1, max_player)
            
            if value < min_value:
                min_value = value
                best_move = move
        
        return min_value, best_move

# 使用示例
class ThreePlayerGame:
    def __init__(self):
        self.current_player = 0
        self.scores = [0, 0, 0]
    
    def evaluate(self, player):
        """评估函数:返回该玩家的得分"""
        return self.scores[player]
    
    def legal_moves(self):
        return ["move1", "move2", "move3"]
    
    def is_terminal(self):
        return max(self.scores) >= 100

# AI 决策
game = ThreePlayerGame()
best_score, best_move = paranoid_search(game, depth=4, max_player=0)
print(f"最佳移动: {best_move}, 预期得分: {best_score}")

优点

  • ✅ 保守策略,适合领先时防守
  • ✅ 计算复杂度与双人博弈相同 $O(b^d)$

缺点

  • ❌ 过于悲观,假设所有人都针对你
  • ❌ 不适合落后时的进攻策略

2.2 MaxN 算法

假设:每个玩家都独立最大化自己的收益。

适用场景:更符合现实的多人博弈。

def maxn_search(state, depth):
    """
    MaxN 算法:每个玩家最大化自己的收益
    
    Returns:
        tuple: (评估向量, 最佳移动)
        评估向量: [player0_value, player1_value, player2_value, ...]
    """
    if depth == 0 or state.is_terminal():
        return state.evaluate_all(), None
    
    current_player = state.current_player
    best_value = None
    best_move = None
    
    for move in state.legal_moves():
        next_state = state.apply_move(move)
        value_tuple, _ = maxn_search(next_state, depth - 1)
        
        # 当前玩家选择使自己收益最大的移动
        if best_value is None or value_tuple[current_player] > best_value[current_player]:
            best_value = value_tuple
            best_move = move
    
    return best_value, best_move

class ThreePlayerGame:
    def evaluate_all(self):
        """返回所有玩家的评估值"""
        return tuple(self.scores)  # (score0, score1, score2)
    
    def apply_move(self, move):
        """应用移动并返回新状态"""
        new_state = copy.deepcopy(self)
        # 执行移动逻辑
        new_state.current_player = (new_state.current_player + 1) % 3
        return new_state

# 使用
game = ThreePlayerGame()
value_tuple, best_move = maxn_search(game, depth=4)
print(f"最佳移动: {best_move}")
print(f"预期结果: Player0={value_tuple[0]}, Player1={value_tuple[1]}, Player2={value_tuple[2]}")

优点

  • ✅ 更符合实际情况
  • ✅ 能发现合作与竞争的平衡点

缺点

  • ❌ 复杂度高 $O(b^{d \cdot n})$,$n$ 为玩家数
  • ❌ 需要准确的评估函数

2.3 Shallow Pruning(浅层剪枝)

原理:在 MaxN 中进行剪枝优化。

def maxn_with_pruning(state, depth, bounds=None):
    """
    MaxN + Shallow Pruning
    
    Args:
        bounds: [lower_bound, upper_bound] 对于当前玩家
    """
    if depth == 0 or state.is_terminal():
        return state.evaluate_all(), None
    
    current_player = state.current_player
    
    if bounds is None:
        bounds = [float('-inf'), float('inf')]
    
    best_value = None
    best_move = None
    
    for move in state.legal_moves():
        next_state = state.apply_move(move)
        value_tuple, _ = maxn_with_pruning(next_state, depth - 1, bounds)
        
        if best_value is None or value_tuple[current_player] > best_value[current_player]:
            best_value = value_tuple
            best_move = move
            
            # 更新下界
            bounds[0] = max(bounds[0], value_tuple[current_player])
            
            # 剪枝:如果当前玩家的收益已经超过上界
            if bounds[0] >= bounds[1]:
                break
    
    return best_value, best_move

思想:计算对其他玩家策略的最佳响应。

class BestReplySearch:
    def __init__(self, num_players):
        self.num_players = num_players
    
    def find_nash_equilibrium(self, game, max_iterations=100):
        """寻找纳什均衡"""
        # 初始化:随机策略
        strategies = [self.random_strategy(game) for _ in range(self.num_players)]
        
        for iteration in range(max_iterations):
            improved = False
            
            for player in range(self.num_players):
                # 固定其他玩家的策略
                opponent_strategies = strategies[:player] + strategies[player+1:]
                
                # 计算最佳响应
                best_response = self.best_response(game, player, opponent_strategies)
                
                # 如果找到更好的策略,更新
                if self.is_better(best_response, strategies[player], player, game):
                    strategies[player] = best_response
                    improved = True
            
            # 如果没有改进,达到纳什均衡
            if not improved:
                break
        
        return strategies
    
    def best_response(self, game, player, opponent_strategies):
        """计算玩家对对手策略的最佳响应"""
        best_payoff = float('-inf')
        best_strategy = None
        
        # 枚举所有可能的策略
        for strategy in self.all_strategies(game, player):
            # 计算该策略的期望收益
            payoff = game.evaluate_strategy(player, strategy, opponent_strategies)
            
            if payoff > best_payoff:
                best_payoff = payoff
                best_strategy = strategy
        
        return best_strategy

三、联盟与谈判

3.1 Shapley 值(联盟贡献分配)

原理:根据每个玩家对联盟的贡献分配收益。

\[\phi_i(v) = \sum_{S \subseteq N \setminus \{i\}} \frac{|S|! (n - |S| - 1)!}{n!} [v(S \cup \{i\}) - v(S)]\]
from itertools import combinations, permutations

def shapley_value(player, coalition_values, all_players):
    """
    计算 Shapley 值
    
    Args:
        player: 目标玩家
        coalition_values: 字典 {frozenset: value}
        all_players: 所有玩家的集合
    """
    n = len(all_players)
    shapley = 0
    
    # 枚举所有不包含该玩家的联盟
    other_players = all_players - {player}
    
    for r in range(len(other_players) + 1):
        for subset in combinations(other_players, r):
            S = frozenset(subset)
            S_with_player = S | {player}
            
            # 计算边际贡献
            marginal_contribution = coalition_values[S_with_player] - coalition_values[S]
            
            # 计算权重
            weight = (
                math.factorial(len(S)) * 
                math.factorial(n - len(S) - 1) / 
                math.factorial(n)
            )
            
            shapley += weight * marginal_contribution
    
    return shapley

# 示例:三人游戏
coalition_values = {
    frozenset(): 0,
    frozenset(['A']): 10,
    frozenset(['B']): 10,
    frozenset(['C']): 10,
    frozenset(['A', 'B']): 30,  # A+B 合作价值高
    frozenset(['A', 'C']): 25,
    frozenset(['B', 'C']): 25,
    frozenset(['A', 'B', 'C']): 60
}

all_players = frozenset(['A', 'B', 'C'])

for player in ['A', 'B', 'C']:
    value = shapley_value(player, coalition_values, all_players)
    print(f"{player} 的 Shapley 值: {value:.2f}")

3.2 核心解(Core Solution)

定义:一种收益分配方案,使得任何子联盟都没有动机脱离大联盟。

\[\text{Core} = \left\{ x \in \mathbb{R}^n : \sum_{i \in N} x_i = v(N), \sum_{i \in S} x_i \geq v(S), \forall S \subseteq N \right\}\]
from scipy.optimize import linprog

def find_core_allocation(coalition_values, all_players):
    """
    寻找核心解(使用线性规划)
    """
    n = len(all_players)
    players = list(all_players)
    
    # 目标:随机找一个核心解(这里用最小化差异)
    c = [1] * n  # 最小化总分配
    
    # 约束1:总和 = v(N)
    A_eq = [[1] * n]
    b_eq = [coalition_values[all_players]]
    
    # 约束2:每个子联盟的分配 >= v(S)
    A_ub = []
    b_ub = []
    
    for r in range(1, n):
        for subset in combinations(players, r):
            S = frozenset(subset)
            constraint = [-1 if p in subset else 0 for p in players]
            A_ub.append(constraint)
            b_ub.append(-coalition_values[S])
    
    # 求解
    result = linprog(c, A_ub=A_ub, b_ub=b_ub, A_eq=A_eq, b_eq=b_eq)
    
    if result.success:
        return dict(zip(players, result.x))
    else:
        return None  # 核心解不存在

# 使用
allocation = find_core_allocation(coalition_values, all_players)
if allocation:
    print("核心解分配:")
    for player, value in allocation.items():
        print(f"{player}: {value:.2f}")
else:
    print("核心解不存在(可能导致联盟不稳定)")

3.3 动态联盟形成

class DynamicCoalitionGame:
    def __init__(self, players, coalition_values):
        self.players = players
        self.coalition_values = coalition_values
        self.current_coalitions = {frozenset([p]) for p in players}  # 初始各自独立
    
    def should_merge(self, coalition1, coalition2):
        """判断两个联盟是否应该合并"""
        merged = coalition1 | coalition2
        
        # 计算合并的收益
        merged_value = self.coalition_values[merged]
        separate_value = self.coalition_values[coalition1] + self.coalition_values[coalition2]
        
        # 如果合并收益更高,且能公平分配
        if merged_value > separate_value:
            # 检查是否存在核心解
            return self.has_fair_allocation(merged)
        
        return False
    
    def has_fair_allocation(self, coalition):
        """检查联盟是否有公平的收益分配"""
        # 简化版:使用 Shapley 值判断
        shapley_sum = sum([
            shapley_value(p, self.coalition_values, coalition)
            for p in coalition
        ])
        
        return abs(shapley_sum - self.coalition_values[coalition]) < 1e-6
    
    def evolve_coalitions(self, rounds=10):
        """联盟动态演化"""
        for round_num in range(rounds):
            print(f"\n=== Round {round_num + 1} ===")
            print(f"当前联盟: {self.current_coalitions}")
            
            # 尝试所有可能的合并
            best_merge = None
            best_gain = 0
            
            coalitions_list = list(self.current_coalitions)
            for i in range(len(coalitions_list)):
                for j in range(i + 1, len(coalitions_list)):
                    c1, c2 = coalitions_list[i], coalitions_list[j]
                    
                    if self.should_merge(c1, c2):
                        gain = (
                            self.coalition_values[c1 | c2] -
                            self.coalition_values[c1] -
                            self.coalition_values[c2]
                        )
                        if gain > best_gain:
                            best_gain = gain
                            best_merge = (c1, c2)
            
            # 执行最佳合并
            if best_merge:
                c1, c2 = best_merge
                self.current_coalitions.remove(c1)
                self.current_coalitions.remove(c2)
                self.current_coalitions.add(c1 | c2)
                print(f"合并: {c1} + {c2} = {c1 | c2}, 增益: {best_gain:.2f}")
            else:
                print("没有有益的合并,联盟稳定")
                break
        
        return self.current_coalitions

# 使用
game = DynamicCoalitionGame(
    players=['A', 'B', 'C', 'D'],
    coalition_values={
        frozenset(): 0,
        frozenset(['A']): 10,
        frozenset(['B']): 10,
        frozenset(['C']): 10,
        frozenset(['D']): 10,
        frozenset(['A', 'B']): 30,
        frozenset(['C', 'D']): 25,
        frozenset(['A', 'B', 'C']): 50,
        frozenset(['A', 'B', 'D']): 48,
        frozenset(['A', 'B', 'C', 'D']): 70
    }
)

final_coalitions = game.evolve_coalitions()
print(f"\n最终联盟: {final_coalitions}")

四、实战案例

4.1 三国杀 AI

游戏特点

  • 3-8 人,身份隐藏(主公、忠臣、反贼、内奸)
  • 联盟动态变化(反贼联合对抗主公)
  • 信息不完全(不知道其他人身份)
class SanguoshaAI:
    def __init__(self, my_identity, visible_identities):
        self.my_identity = my_identity  # '主公', '忠臣', '反贼', '内奸'
        self.visible_identities = visible_identities
        
    def evaluate_threat(self, player):
        """评估威胁等级"""
        if self.my_identity == '主公':
            # 主公视角:反贼 > 内奸 > 忠臣
            if player.suspected_identity == '反贼':
                return 10
            elif player.suspected_identity == '内奸':
                return 5
            else:
                return 0
        
        elif self.my_identity == '反贼':
            # 反贼视角:主公 > 忠臣 > 内奸
            if player.is_lord:
                return 10
            elif player.suspected_identity == '忠臣':
                return 7
            else:
                return 3
        
        elif self.my_identity == '内奸':
            # 内奸视角:保持平衡,清除威胁最大的
            # 前期帮反贼削弱主公,后期收割
            if self.game_phase == 'early':
                if player.is_lord:
                    return 5
            else:
                return player.current_hp  # 血量高的威胁大
    
    def choose_target(self, possible_targets):
        """选择攻击目标"""
        # 综合考虑:威胁等级、血量、手牌数、距离
        scores = []
        
        for target in possible_targets:
            score = 0
            
            # 威胁等级
            score += self.evaluate_threat(target) * 10
            
            # 血量(优先击杀残血)
            if target.hp <= 2:
                score += 20
            
            # 防御能力
            score -= len(target.armor) * 5
            
            # 合作关系(避免误伤盟友)
            if self.is_ally(target):
                score -= 100
            
            scores.append((score, target))
        
        return max(scores, key=lambda x: x[0])[1]
    
    def is_ally(self, player):
        """判断是否是盟友"""
        if self.my_identity == '主公':
            return player.suspected_identity == '忠臣'
        elif self.my_identity == '忠臣':
            return player.is_lord
        elif self.my_identity == '反贼':
            return player.suspected_identity == '反贼'
        else:  # 内奸
            return False  # 内奸无盟友

4.2 狼人杀 AI

游戏特点

  • 信息极度不对称
  • 需要推理、欺骗、说服
  • 投票机制
class WerewolfAI:
    def __init__(self, role):
        self.role = role  # '狼人', '村民', '预言家', '女巫'
        self.beliefs = {}  # 对每个玩家身份的信念
        
    def update_beliefs(self, observations):
        """根据观察更新信念"""
        for player, action in observations:
            if action == '跳预言家':
                # 贝叶斯更新
                if player not in self.beliefs:
                    self.beliefs[player] = {'预言家': 0.5, '狼人': 0.3, '村民': 0.2}
                else:
                    # 如果已经有人跳预言家,这个人可能是狼人
                    if any(b.get('预言家', 0) > 0.5 for b in self.beliefs.values()):
                        self.beliefs[player]['狼人'] += 0.3
                    else:
                        self.beliefs[player]['预言家'] += 0.3
    
    def vote(self, candidates):
        """投票决策"""
        if self.role == '狼人':
            # 狼人:投票给威胁最大的神职
            threat_scores = []
            for player in candidates:
                threat = 0
                if self.beliefs.get(player, {}).get('预言家', 0) > 0.5:
                    threat = 10
                elif self.beliefs.get(player, {}).get('女巫', 0) > 0.5:
                    threat = 8
                threat_scores.append((threat, player))
            
            return max(threat_scores, key=lambda x: x[0])[1]
        
        else:  # 好人
            # 投票给最可能是狼人的
            wolf_probs = []
            for player in candidates:
                prob = self.beliefs.get(player, {}).get('狼人', 0)
                wolf_probs.append((prob, player))
            
            return max(wolf_probs, key=lambda x: x[0])[1]
    
    def speak(self, phase):
        """发言策略"""
        if self.role == '狼人':
            return self.wolf_speak_strategy(phase)
        elif self.role == '预言家':
            return self.seer_speak_strategy(phase)
        else:
            return self.villager_speak_strategy(phase)
    
    def wolf_speak_strategy(self, phase):
        """狼人发言策略"""
        if phase == 'early':
            # 前期低调,不暴露身份
            return "我是村民,没有特殊信息"
        elif phase == 'mid':
            # 中期带节奏,质疑真预言家
            return "我怀疑 X 号玩家是假预言家"
        else:
            # 后期反击
            return "根据逻辑推理,Y 号玩家一定是狼"

4.3 德州扑克多人策略

class TexasHoldemMultiplayer:
    def __init__(self, num_players):
        self.num_players = num_players
        
    def compute_nash_equilibrium(self, game_state):
        """计算纳什均衡策略"""
        # 使用 Counterfactual Regret Minimization (CFR)
        # 简化版:只考虑当前轮
        pass
    
    def adjust_for_position(self, strategy, position, num_active):
        """根据位置调整策略"""
        # 按钮位置(最后行动):更激进
        # 盲注位置(先行动):更保守
        
        if position == num_active - 1:  # 按钮位置
            strategy['raise_prob'] *= 1.5
            strategy['bluff_prob'] *= 1.3
        elif position < 2:  # 盲注位置
            strategy['fold_prob'] *= 1.2
            strategy['raise_prob'] *= 0.8
        
        return strategy
    
    def exploit_weaknesses(self, opponent_models):
        """利用对手弱点"""
        exploits = []
        
        for opponent in opponent_models:
            if opponent['fold_freq'] > 0.7:
                # 对手容易弃牌 → 多诈唬
                exploits.append(('bluff_more', opponent['id']))
            elif opponent['call_freq'] > 0.6:
                # 对手爱跟注 → 少诈唬,有牌就下重注
                exploits.append(('value_bet', opponent['id']))
        
        return exploits

五、高级技巧

5.1 对手建模(Opponent Modeling)

class OpponentModel:
    def __init__(self, player_id):
        self.player_id = player_id
        self.action_history = []
        self.strategy_distribution = None
        
    def update(self, state, action):
        """更新对手模型"""
        self.action_history.append((state, action))
        
        # 使用贝叶斯推理更新策略分布
        self.strategy_distribution = self.infer_strategy()
    
    def infer_strategy(self):
        """推断对手的策略"""
        # 统计对手在不同情况下的行为
        strategy = {
            'aggressive': 0,
            'conservative': 0,
            'balanced': 0
        }
        
        for state, action in self.action_history:
            if action.is_aggressive:
                strategy['aggressive'] += 1
            elif action.is_conservative:
                strategy['conservative'] += 1
            else:
                strategy['balanced'] += 1
        
        # 归一化
        total = sum(strategy.values())
        return {k: v / total for k, v in strategy.items()}
    
    def predict_action(self, state):
        """预测对手的行为"""
        if self.strategy_distribution['aggressive'] > 0.5:
            return self.simulate_aggressive_action(state)
        elif self.strategy_distribution['conservative'] > 0.5:
            return self.simulate_conservative_action(state)
        else:
            return self.simulate_balanced_action(state)

5.2 元博弈(Meta-Game)

class MetaGameAI:
    """考虑对手也在建模你的 AI"""
    
    def __init__(self):
        self.level = 2  # 思考层级:我知道你知道我知道...
        
    def level_k_reasoning(self, state, k):
        """k 层推理"""
        if k == 0:
            # Level-0: 随机或简单策略
            return random.choice(state.legal_moves())
        
        # Level-k: 假设对手是 Level-(k-1)
        opponent_action = self.level_k_reasoning(state, k - 1)
        
        # 找到对 Level-(k-1) 对手的最佳响应
        best_response = self.best_response_to(state, opponent_action)
        
        return best_response
    
    def mixed_strategy(self, pure_strategies, weights):
        """混合策略(防止被预测)"""
        return random.choices(pure_strategies, weights=weights)[0]

六、总结

多人博弈 AI 设计要点

  1. 算法选择
    • 领先时:Paranoid(防守)
    • 均势时:MaxN(平衡)
    • 落后时:进攻策略 + 联盟
  2. 联盟策略
    • 使用 Shapley 值评估贡献
    • 寻找核心解保证稳定
    • 动态调整联盟
  3. 信息处理
    • 对手建模(贝叶斯更新)
    • 信息隐藏(混合策略)
    • 欺骗与识破
  4. 实战技巧
    • 位置优势利用
    • 利用对手弱点
    • 元博弈思维

推荐学习路径

  1. 实现 Paranoid 和 MaxN
  2. 学习博弈论基础(纳什均衡、Shapley 值)
  3. 实践简单的多人游戏(如三人井字棋)
  4. 进阶到不完全信息游戏(狼人杀、德州扑克)

参考资源

论文

书籍

  • 《Game Theory: Analysis of Conflict》- Roger Myerson
  • 《Multiagent Systems》- Yoav Shoham & Kevin Leyton-Brown

代码

游戏

  • 三国杀 - 隐藏身份 + 联盟
  • 狼人杀 - 推理 + 欺骗
  • 卡坦岛 - 资源谈判 + 联盟
  • Diplomacy - 纯外交博弈

💬 交流与讨论

⚠️ 尚未完成 Giscus 配置。请在 _config.yml 中设置 repo_idcategory_id 后重新部署,即可启用升级后的评论系统。

配置完成后,评论区将自动支持 Markdown 代码高亮与 LaTeX 数学公式渲染,访客回复会同步到 GitHub Discussions,并具备通知功能。