百度图片搜索工具

基于网络请求分析的百度图片搜索 API 实现，支持批量搜索和下载图片。

功能特性

🔍 智能搜索: 基于百度图片搜索 API，支持关键词搜索
📊 批量处理: 支持批量搜索多个关键词，多页结果
⬇️ 智能下载: 支持同时下载缩略图和原图
📁 自动整理: 按关键词和页码自动整理文件
⚙️ 灵活配置: 支持自定义 cookie、延迟、目录等参数
🖥️ 多种模式: 支持命令行和交互式操作

安装方法

uv sync

使用方法

1. 快速开始（Selenium 模式）

uv run selenium_baidu_crawler.py

文件结构

browsermcp-百度图片搜索/

├── selenium_baidu_crawler.py      # Selenium浏览器爬虫


├── README.md                      # 使用说明
└── requirements.txt               # 依赖包

安装依赖

pip install -r requirements.txt

快速开始

两种爬虫模式对比

| 特性 | API 模式 | Selenium 模式 | | ---------- | ----------------------- | -------------------- | | 稳定性 | 依赖 API 接口，可能失效 | 基于浏览器，稳定性高 | | 反爬能力 | 容易被封 IP | 模拟真实用户行为 | | 图片质量 | 可能获取缩略图 | 获取高清原图 | | 配置复杂度 | 需要 Cookie 配置 | 需要 Chrome 浏览器 | | 速度 | 快 | 相对较慢 |

1. API 模式基础使用

from baidu_image_api_search import BaiduImageSearcher

# 创建搜索器
searcher = BaiduImageSearcher()

# 搜索图片
images = searcher.search_and_save_multiple_pages("壁纸", pages=5)

# 下载图片
searcher.download_images(images)

2. Selenium 模式使用

2.1 直接运行

uv run selenium_baidu_crawler.py

2.2 自定义参数运行

from selenium_baidu_crawler import SeleniumBaiduCrawler

# 创建爬虫实例
crawler = SeleniumBaiduCrawler(
    keyword="高清壁纸",
    max_pages=10,
    binary_location="C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe",
    base_dir="my_wallpapers"
)

# 开始爬取
result = crawler.start_crawling()

配置说明

Selenium 模式配置

系统要求

Chrome 浏览器: 需要安装 Chrome 浏览器（推荐最新版本）
ChromeDriver: 会自动下载匹配版本的驱动
内存要求: 至少 4GB 内存（推荐 8GB 以上）
存储空间: 根据下载图片数量，预留足够磁盘空间

Chrome 浏览器路径配置

在selenium_baidu_crawler.py中修改 Chrome 浏览器路径：

from selenium_baidu_crawler import SeleniumBaiduCrawler

# Windows系统默认路径
crawler = SeleniumBaiduCrawler(
    binary_location="C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe"
)

# 或自定义路径
crawler = SeleniumBaiduCrawler(
    binary_location="D:\\Tools\\Chrome\\chrome.exe"
)

Cookie 配置（可选）

Selenium 模式通常不需要手动配置 Cookie，但如果遇到访问限制，可以：

打开浏览器，访问百度图片
按 F12 打开开发者工具
切换到 Network 标签
搜索任意关键词
找到acjson请求，复制 Request Headers 中的 Cookie
将 Cookie 粘贴到selenium_baidu_crawler.py的self.headers变量中

下载配置

# 下载选项
DOWNLOAD_THUMBNAILS = True    # 下载缩略图
DOWNLOAD_ORIGINALS = True     # 下载原图
DOWNLOAD_DELAY = 0.5          # 下载延迟（秒）
DOWNLOAD_TIMEOUT = 30         # 超时时间（秒）

API 使用详解

模式选择指南

API 模式 (BaiduImageSearcher): 适合批量快速下载，需要 Cookie 配置
Selenium 模式 (SeleniumBaiduCrawler): 适合获取高清原图，稳定性更好

API 模式：BaiduImageSearcher

初始化

from baidu_image_api_search import BaiduImageSearcher

searcher = BaiduImageSearcher(cookie="你的cookie")

搜索图片

# 搜索单页
json_data = searcher.search_images("壁纸", page=0, per_page=30)

# 提取图片信息
images = searcher.extract_image_urls(json_data)

批量搜索

# 搜索多页并保存
images = searcher.search_and_save_multiple_pages(
    "动漫",
    pages=10,
    output_dir="results"
)

下载图片

searcher.download_images(
    images,
    download_dir="downloads",
    download_thumbs=True,
    download_originals=True
)

Selenium 模式：SeleniumBaiduCrawler

初始化参数

from selenium_baidu_crawler import SeleniumBaiduCrawler

# 基础初始化
crawler = SeleniumBaiduCrawler(
    binary_location="C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe",
    keyword="壁纸",
    max_pages=20,
    base_dir="selenium_baidu_wallpapers"
)

参数说明

| 参数名 | 类型 | 默认值 | 说明 | | ----------------- | ---- | --------------------------- | ----------------------------------- | | binary_location | str | - | Chrome 浏览器可执行文件路径（必需） | | keyword | str | "壁纸" | 搜索关键词 | | max_pages | int | 20 | 最大爬取页数 | | base_dir | str | "selenium_baidu_wallpapers" | 输出根目录 |

主要方法

# 初始化浏览器
crawler.init_driver()

# 开始完整爬取任务
result = crawler.start_crawling()

# 爬取单页
crawler.crawl_page(page_num=1)

# 滚动加载更多图片
crawler.scroll_to_load_images(times=5)

# 提取图片数据
images = crawler.extract_image_data()

# 下载单张图片
result = crawler.download_image(image_info, save_dir, index)

返回数据结构

API 模式返回结构

每张图片包含以下信息：

{
    'thumb_url': '缩略图URL',
    'original_url': '原图URL',
    'title': '图片标题',
    'width': 宽度,
    'height': 高度,
    'type': '图片类型',
    'size': '文件大小'
}

Selenium 模式返回结构

{
    'originalUrl': '原图URL',
    'thumbUrl': '缩略图URL',
    'downloadUrl': '直接下载URL',
    'title': '图片标题',
    'index': 图片索引
}

Selenium 模式任务结果

{
    "total_pages": 10,
    "total_images": 300,
    "results": [
        {
            "page": 1,
            "images_count": 30,
            "download_dir": "selenium_baidu_wallpapers/data/page_01",
            "download_results": [...]
        }
    ],
    "base_dir": "绝对路径"
}

文件输出格式

API 模式文件格式

URL 列表文件

缩略图: https://img0.baidu.com/it/u=...,f=JPEG?w=300&h=300
原图: https://img0.baidu.com/it/u=...,f=JPEG?w=1920&h=1080
标题: 高清风景壁纸
尺寸: 1920x1080
类型: jpg
大小: 2.3MB
--------------------------------------------------

Selenium 模式文件格式

目录结构

selenium_baidu_wallpapers/
├── images/
│   ├── page_01/
│   │   ├── images/
│   │   │   ├── baidu_wallpaper_0001_abc123.jpg
│   │   │   └── baidu_wallpaper_0002_def456.png
│   │   ├── images.json
│   │   ├── urls.txt
│   │   └── download_results.json
│   └── page_02/
│       └── ...
├── data/
│   └── crawl_report.json
└── logs/

数据文件格式

images.json: 完整的图片信息列表 urls.txt: 图片 URL 和标题的文本格式 download_results.json: 下载结果统计 crawl_report.json: 整体爬取报告

注意事项

Cookie 更新: Cookie 会过期，需要定期更新
请求频率: 建议设置适当的下载延迟，避免被封 IP
文件大小: 大文件下载可能超时，可以调整超时时间
存储空间: 原图通常较大，确保有足够的存储空间
网络环境: 某些网络环境可能需要代理

故障排除

常见问题对比表

| 问题类型 | API 模式解决方案 | Selenium 模式解决方案 | | ------------- | ---------------------- | ---------------------- | | 搜索结果为空 | 检查 Cookie 有效性 | 检查网络连接和关键词 | | 403 Forbidden | 更新 Cookie 或使用代理 | 检查浏览器完整性 | | 下载失败 | 增加延迟或检查 URL | 检查存储空间和网络 | | 程序崩溃 | 检查 API 接口变更 | 检查 Chrome 版本兼容性 |

API 模式特有问题

搜索结果为空
- 检查关键词是否正确
- 验证 Cookie 是否有效
- 检查网络连接
403 错误
- 更新 Cookie
- 检查 Referer 头设置
- 使用代理

Selenium 模式特有问题

Chrome 浏览器未找到

# 检查Chrome安装路径
# Windows默认路径：
"C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe"
# 或
"C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe"

ChromeDriver 版本不匹配
- 确保 Chrome 浏览器为最新版本
- 使用chromedriver-autoinstaller自动匹配版本
内存不足
- 减少max_pages参数
- 关闭其他占用内存的程序
- 使用无头模式（取消注释--headless）
页面加载超时
- 增加time.sleep()延迟
- 检查网络连接稳定性
- 调整滚动加载次数

性能优化建议

Selenium 模式优化

# 1. 使用无头模式（不显示浏览器界面）
chrome_options.add_argument('--headless')

# 2. 减少内存占用
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')

# 3. 调整滚动加载策略
# 减少滚动次数，增加单次滚动等待时间
crawler.scroll_to_load_images(times=2)  # 默认5次改为2次

调试模式

API 模式调试

import logging
logging.basicConfig(level=logging.DEBUG)

Selenium 模式调试

import logging
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s"
)

# 查看浏览器控制台输出
console_logs = crawler.driver.get_log('browser')
for log in console_logs:
    print(log)

扩展功能

自定义搜索参数

API 模式自定义参数

# 修改搜索API参数
params = {
    'tn': 'resultjson_com',
    'word': keyword,
    'pn': 0,      # 起始位置
    'rn': 30,     # 返回数量
    'gsm': '1e'
}

Selenium 模式自定义搜索

# 自定义搜索URL参数
crawler = SeleniumBaiduCrawler(
    keyword="动漫壁纸",
    max_pages=15,
    base_dir="custom_downloads"
)

# 修改搜索页面参数（在get_search_url方法中）
params = {
    "width": "1920",      # 指定最小宽度
    "height": "1080",     # 指定最小高度
    "face": "0",          # 排除人脸图片
    "istype": "2"         # 只显示高清图片
}

图片过滤

API 模式过滤

# 过滤特定尺寸的图片
filtered = [img for img in images if img['width'] >= 1920]

# 过滤特定格式的图片
allowed_formats = ['jpg', 'png']
filtered = [img for img in images if img['type'] in allowed_formats]

Selenium 模式过滤

# 基于文件大小过滤
images = crawler.extract_image_data()
filtered = [img for img in images if 'large' in img.get('title', '').lower()]

# 基于URL特征过滤
filtered = [img for img in images if 'large' in img.get('originalUrl', '')]

批量关键词处理

Selenium 模式批量处理

keywords = ["动漫壁纸", "风景壁纸", "游戏壁纸", "美女壁纸"]

for keyword in keywords:
    print(f"正在处理关键词: {keyword}")
    crawler = SeleniumBaiduCrawler(
        keyword=keyword,
        max_pages=5,
        base_dir=f"wallpapers_{keyword}"
    )
    result = crawler.start_crawling()
    print(f"{keyword} 处理完成，共下载 {result['total_images']} 张图片")

实际使用示例

场景 1：快速下载高清壁纸

# 使用Selenium模式获取高清壁纸
from selenium_baidu_crawler import SeleniumBaiduCrawler

# 创建爬虫实例
crawler = SeleniumBaiduCrawler(
    keyword="4K壁纸",
    max_pages=10,
    binary_location="C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe",
    base_dir="4k_wallpapers"
)

# 开始爬取
result = crawler.start_crawling()
print(f"下载完成！共获取 {result['total_images']} 张4K壁纸")

场景 3：API 模式快速测试

from baidu_image_api_search import BaiduImageSearcher

# 快速测试API是否可用
searcher = BaiduImageSearcher(cookie="your_cookie_here")
images = searcher.search_images("测试", page=0, per_page=5)
print(f"找到 {len(images)} 张测试图片")

版本更新日志

v2.0.0 (当前版本)

✅ 新增 Selenium 浏览器爬虫模式
✅ 支持 Chrome 浏览器自动化
✅ 自动滚动加载更多图片
✅ 改进的文件组织结构
✅ 增强的错误处理和重试机制

v1.0.0

✅ 基础 API 搜索功能
✅ 批量下载支持
✅ Cookie 配置系统
✅ 多线程下载优化

配置 mcp 工具 playwright

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["-y", "@playwright/mcp@latest"]
    }
  }
}