-
-A real-time interactive streaming digital human engine enabling synchronized audio-video conversation, widely adopted in commercial applications.
-
-**Demos**: [wav2lip](https://youtu.be/-ss0H8qLr7E) | [ernerf](https://www.bilibili.com/video/BV1G1421z73r/) | [musetalk](https://youtu.be/vzUMruoZlxc/)
-
-Domestic Mirror:
-
----
-
-## Features
-1. Supports multiple digital human models: ernerf, musetalk, wav2lip, Ultralight-Digital-Human
-2. Supports voice cloning
-3. Supports interrupting the digital human while speaking
-4. Supports full-body video stitching
-5. Supports WebRTC, RTMP, and virtual camera output
-6. Supports action choreography: plays custom videos when not speaking
-7. Supports multi-concurrency
-8. Supports custom digital human avatars
-9. Provides frontend API integration
-
----
-
-## Usage Scenarios
-
-LiveTalking leverages real-time streaming digital human technology to drive virtual avatars via text or voice, combined with LLM for intelligent conversation. Suitable for the following scenarios:
-
-| Scenario | Description |
-|----------|-------------|
-| **Virtual Streamer / Live Commerce** | 24/7 unmanned live streaming with LLM-generated sales scripts and action choreography for natural performance |
-| **AI Digital Human Customer Service** | Integrate enterprise knowledge bases for real-time voice Q&A with interruption support |
-| **Online Education / Training** | Digital teacher分身 for course recording, or API-driven digital instructor for real-time lectures |
-| **Intelligent Voice Assistant** | Pair with smart speakers or apps, calling the `/human` API to drive digital human voice interactions |
-| **Large Screen Presentation** | Digital human presenter for exhibition halls, event venues, and other content narration scenarios |
-| **Batch Short Video Creation** | Submit scripts in batch via API to generate digital human videos without real-person filming, using `/human` + `/record` APIs |
-
-**Core Flow**: User input (text/audio) → LLM response (optional) → TTS speech synthesis → Real-time lip-sync → Audio/video streaming output
-
----
-
-## 1. Installation
-
-Tested on Ubuntu 24.04, Python 3.12, PyTorch 2.9.1, CUDA 13.0.
-
-### 1.1 Install Dependencies
-
-```bash
-git clone https://github.com/lipku/LiveTalking.git
-conda create -n livetalking python=3.12
-conda activate livetalking
-# If CUDA version is not 13.0 (check via nvidia-smi), install the corresponding PyTorch version(https://pytorch.org/get-started/previous-versions)
-pip install torch==2.9.1 torchvision==0.24.1 torchaudio==2.9.1 --index-url https://download.pytorch.org/whl/cu130
-cd LiveTalking
-pip install -r requirements.txt
-```
-
-Installation FAQ:
-
-Linux CUDA environment setup:
-
----
-
-## 2. Quick Start
-
-### 2.1 Download Models
-
-| Source | Link |
-|--------|------|
-| Quark Cloud | |
-| Google Drive | |
-
-1. Copy `wav2lip256.pth` to the project's `models/` directory and rename it to `wav2lip.pth`
-2. Extract `wav2lip256_avatar1.tar.gz` and copy the entire extracted folder to `data/avatars/`
-
-### 2.2 Start the Server
-
-```bash
-python app.py --transport webrtc --model wav2lip --avatar_id wav2lip256_avatar1
-```
-
-> **Note**: The server must open ports TCP:8010, UDP:1-65536
-
-### 2.3 Client Access
-
-| Method | Description |
-|--------|-------------|
-| Browser | Open `http://serverip:8010/index.html`, click "Start Connection" to play the digital human video, then enter text and submit |
-| API | See [API Docs](docs/api.md) for HTTP-based integration |
-| Desktop App | Download: |
-
-### 2.4 Web Pages
-
-| Page | URL | Description |
-|------|-----|-------------|
-| Home | `/index.html` | WebRTC connection + text/audio driver + recording control |
-| Avatar Creator | `/avatar.html` | Upload video to auto-generate digital human avatars |
-| Admin Console | `/admin.html` | Real-time session monitoring & global configuration |
-
-
-
-### 2.5 Quick Experience
-
-Create an instance with a cloud image to run instantly:
-
-- [UCloud Image](https://www.compshare.cn/images/4458094e-a43d-45fe-9b57-de79253befe4?referral_code=3XW3852OBmnD089hMMrtuU&ytag=GPU_GitHub_livetalking)
-
-### 2.6 Documentation
-
-
----
-
-## 3. Architecture
-
-### Dataflow Diagram
-
-
-
-### Layer Overview
-
-**API Layer**
-- `/human`: Accepts text, supporting echo (direct playback) and chat (LLM conversation) modes
-- `/humanaudio`: Accepts audio files for direct playback
-- Each connection is assigned a unique `sessionid`, supporting multi-user concurrency
-
-**Logic Layer**
-- **LLM Engine**: Integrates with models like Qwen to generate conversational responses
-- **TTS Engine**: Modular design supporting EdgeTTS, GPT-SoVITS, CosyVoice, Tencent Cloud, and more
-- **Feature Extraction**: Synchronously extracts acoustic features (e.g., Mel spectrograms) for lip-sync inference
-
-**Rendering Layer**
-- **Model Inference**: Uses deep learning models (Wav2Lip, MuseTalk, etc.) to generate lip-sync frames from audio features
-- **Post-Processing**: Smoothly overlays the generated mouth region back onto the original high-definition video
-
-**Streaming Layer**
-- **WebRTC**: Low-latency browser-based streaming
-- **RTMP**: Standard live streaming protocol, supports pushing to platforms like Bilibili/YouTube
-- **Virtual Camera**: Outputs as a system camera device
-
-**Plugin System**
-- Decentralized registration mechanism based on [registry.py](registry.py), allowing developers to extend TTS, Avatar, and Output modules
-
----
-
-## 4. API Documentation
-
-| Document | Description |
-|----------|-------------|
-| [docs/api.md](docs/api.md) | General API — WebRTC, text/audio driver, recording, action choreography |
-| [docs/avatar_api.md](docs/avatar_api.md) | Avatar Generation API — create tasks, query progress, delete tasks |
-| [docs/admin_api.md](docs/admin_api.md) | Admin API — global config, session monitoring, force stop |
-
----
-
-## 5. Docker
-
-Available images:
-- **AutoDL**: — [Tutorial](https://doc.livetalking.ai/en/docs/autodl/)
-- **UCloud**: — Supports opening any port, no additional SRS deployment required — [Tutorial](https://doc.livetalking.ai/en/docs/ucloud/)
-
-> AutoDL cannot open UDP ports, so you need to deploy SRS or TURN relay service separately.
-
----
-
-## 6. Performance
-
-- Each video stream compression consumes CPU; higher resolution means greater CPU usage. Each lip-sync inference consumes GPU
-- Concurrent sessions when not speaking depend on CPU; concurrent speaking sessions depend on GPU
-- In backend logs: `inferfps` = GPU inference frame rate, `finalfps` = final streaming frame rate. Both must be >= 25 for real-time performance
-
-### Real-Time Inference Performance
-
-| Model | GPU | FPS |
-|:------|:----|:----|
-| wav2lip256 | RTX 3060 | 60 |
-| wav2lip256 | RTX 3080Ti | 120 |
-| musetalk | RTX 3080Ti | 42 |
-| musetalk | RTX 3090 | 45 |
-| musetalk | RTX 4090 | 72 |
-
-- wav2lip256: RTX 3060 or higher recommended
-- musetalk: RTX 3080Ti or higher recommended
-
----
-
-## 7. Statement
-
-Videos developed based on this project and published on platforms such as Bilibili, WeChat Channels, and Douyin must include the LiveTalking watermark and logo.
-
----
-
-If this project is helpful to you, please give it a Star. Contributors interested in improving this project are also welcome.
-
-| Community | Link |
-|-----------|------|
-| Knowledge Planet | |
-| WeChat | wxwubug (mention for group invite) |
-| Telegram | |
-| Discord | |
-| Email | lipku@foxmail.com |
-| WeChat Official | 数字人技术 |
-
-
diff --git a/README.md b/README.md
index fa37432d..508192a6 100644
--- a/README.md
+++ b/README.md
@@ -2,8 +2,7 @@
-中文版 | [English](./README-EN.md)
-
+English | [中文版](./README.md)
@@ -16,205 +15,203 @@
-实时交互流式数字人引擎,实现音视频同步对话,已在业内获得广泛商用
+A real-time interactive streaming digital human engine enabling synchronized audio-video conversation, widely adopted in commercial applications.
-**效果演示**: [wav2lip](https://www.bilibili.com/video/BV1scwBeyELA/) | [ernerf](https://www.bilibili.com/video/BV1G1421z73r/) | [musetalk](https://www.bilibili.com/video/BV1bUwezvEnG/)
+**Demos**: [wav2lip](https://youtu.be/-ss0H8qLr7E) | [ernerf](https://www.bilibili.com/video/BV1G1421z73r/) | [musetalk](https://youtu.be/vzUMruoZlxc/)
-国内镜像:
+Domestic Mirror:
---
## Features
-1. 支持多种数字人模型: ernerf、musetalk、wav2lip、Ultralight-Digital-Human
-2. 支持声音克隆
-3. 支持数字人说话被打断
-4. 支持全身视频拼接
-5. 支持 WebRTC、RTMP、虚拟摄像头输出
-6. 支持动作编排:不说话时播放自定义视频
-7. 支持多并发
-8. 支持自定义数字人形象
-9. 提供前端API接口对接
+1. Supports multiple digital human models: ernerf, musetalk, wav2lip, Ultralight-Digital-Human
+2. Supports voice cloning
+3. Supports interrupting the digital human while speaking
+4. Supports full-body video stitching
+5. Supports WebRTC, RTMP, and virtual camera output
+6. Supports action choreography: plays custom videos when not speaking
+7. Supports multi-concurrency
+8. Supports custom digital human avatars
+9. Provides frontend API integration
---
-## 使用场景
+## Usage Scenarios
-LiveTalking 基于实时流式数字人技术,通过文本或语音驱动虚拟形象说话,结合 LLM 实现智能对话。适用于以下场景:
+LiveTalking leverages real-time streaming digital human technology to drive virtual avatars via text or voice, combined with LLM for intelligent conversation. Suitable for the following scenarios:
-| 场景 | 说明 |
-|------|------|
-| **虚拟主播/直播带货** | 24 小时无人直播,通过 LLM 自动生成带货话术,配合动作编排实现自然表现 |
-| **AI 数字人客服** | 接入企业知识库,用户语音提问,数字人实时回答,支持打断重说 |
-| **在线教育/培训** | 教师数字分身录制课程,或通过 API 驱动数字人讲师实时授课 |
-| **智能语音助手** | 结合智能音箱或 APP,调用 `/human` 接口驱动数字人进行语音对话交互 |
-| **大屏讲解** | 数字人讲解员在展厅大屏、活动现场等场景进行内容讲解和互动 |
-| **短视频批量制作** | 通过 API 批量提交文案生成数字人出镜视频,无需真人拍摄,调用 `/human` + `/record` 接口 |
+| Scenario | Description |
+|----------|-------------|
+| **Virtual Streamer / Live Commerce** | 24/7 unmanned live streaming with LLM-generated sales scripts and action choreography for natural performance |
+| **AI Digital Human Customer Service** | Integrate enterprise knowledge bases for real-time voice Q&A with interruption support |
+| **Online Education / Training** | Digital teacher分身 for course recording, or API-driven digital instructor for real-time lectures |
+| **Intelligent Voice Assistant** | Pair with smart speakers or apps, calling the `/human` API to drive digital human voice interactions |
+| **Large Screen Presentation** | Digital human presenter for exhibition halls, event venues, and other content narration scenarios |
+| **Batch Short Video Creation** | Submit scripts in batch via API to generate digital human videos without real-person filming, using `/human` + `/record` APIs |
-**核心流程**:用户输入文字/音频 → LLM 生成回复(可选)→ TTS 合成语音 → 数字人实时口型同步 → 音视频推流输出
+**Core Flow**: User input (text/audio) → LLM response (optional) → TTS speech synthesis → Real-time lip-sync → Audio/video streaming output
---
-## 1. 安装
+## 1. Installation
-已在 Ubuntu 22.04、Python 3.12、PyTorch 2.9.1、CUDA 13.0 测试通过。
+Tested on Ubuntu 24.04, Python 3.12, PyTorch 2.9.1, CUDA 13.0.
-### 1.1 安装依赖
+### 1.1 Install Dependencies
```bash
git clone https://github.com/lipku/LiveTalking.git
conda create -n livetalking python=3.12
conda activate livetalking
-# 如果 CUDA 版本不为 13.0 (运行 nvidia-smi 确认),请根据 PyTorch 官网(https://pytorch.org/get-started/previous-versions)安装对应版本
+# If CUDA version is not 13.0 (check via nvidia-smi), install the corresponding PyTorch version(https://pytorch.org/get-started/previous-versions)
pip install torch==2.9.1 torchvision==0.24.1 torchaudio==2.9.1 --index-url https://download.pytorch.org/whl/cu130
cd LiveTalking
pip install -r requirements.txt
```
-安装常见问题:[FAQ](https://doc.livetalking.ai/docs/faq/)
+Installation FAQ:
-Linux CUDA 环境搭建参考:
+Linux CUDA environment setup:
---
-## 2. 快速开始
+## 2. Quick Start
-### 2.1 下载模型
+### 2.1 Download Models
-| 网盘 | 地址 |
-|------|------|
-| 夸克云盘 | |
+| Source | Link |
+|--------|------|
+| Quark Cloud | |
| Google Drive | |
-1. 将 `wav2lip256.pth` 拷贝到项目的 `models/` 目录下,重命名为 `wav2lip.pth`
-2. 将 `wav2lip256_avatar1.tar.gz` 解压后整个文件夹拷贝到 `data/avatars/` 目录下
+1. Copy `wav2lip256.pth` to the project's `models/` directory and rename it to `wav2lip.pth`
+2. Extract `wav2lip256_avatar1.tar.gz` and copy the entire extracted folder to `data/avatars/`
-### 2.2 启动服务
+### 2.2 Start the Server
```bash
python app.py --transport webrtc --model wav2lip --avatar_id wav2lip256_avatar1
```
+> **Note**: The server must open ports TCP:8010, UDP:1-65536
-> **注意**: 服务端需开放端口 TCP:8010, UDP:1-65536
-
-
-### 2.3 客户端接入
+### 2.3 Client Access
-| 方式 | 说明 |
-|------|------|
-| 浏览器 | 打开 `http://serverip:8010/index.html`,点击"开始连接"播放数字人视频,在文本框输入文字提交即可 |
-| API 调用 | 参考 [API 文档](docs/api.md) 通过 HTTP 接口驱动 |
-| 桌面客户端 | 下载地址: |
+| Method | Description |
+|--------|-------------|
+| Browser | Open `http://serverip:8010/index.html`, click "Start Connection" to play the digital human video, then enter text and submit |
+| API | See [API Docs](docs/api.md) for HTTP-based integration |
+| Desktop App | Download: |
-### 2.4 Web 页面
+### 2.4 Web Pages
-| 页面 | 地址 | 说明 |
-|------|------|------|
-| 首页 | `/index.html` | WebRTC 连接 + 文本/音频驱动 + 录制控制 |
-| Avatar 生成 | `/avatar.html` | 上传视频自动生成数字人形象 |
-| 管理后台 | `/admin.html` | 实时监控会话状态与全局配置 |
+| Page | URL | Description |
+|------|-----|-------------|
+| Home | `/index.html` | WebRTC connection + text/audio driver + recording control |
+| Avatar Creator | `/avatar.html` | Upload video to auto-generate digital human avatars |
+| Admin Console | `/admin.html` | Real-time session monitoring & global configuration |
-### 2.5 快速体验
+### 2.5 Quick Experience
-使用在线镜像创建实例即可运行:
+Create an instance with a cloud image to run instantly:
-- [UCloud 镜像](https://www.compshare.cn/images/4458094e-a43d-45fe-9b57-de79253befe4?referral_code=3XW3852OBmnD089hMMrtuU&ytag=GPU_GitHub_livetalking)
+- [UCloud Image](https://www.compshare.cn/images/4458094e-a43d-45fe-9b57-de79253befe4?referral_code=3XW3852OBmnD089hMMrtuU&ytag=GPU_GitHub_livetalking)
+
+### 2.6 Documentation
+
-### 2.6 使用说明
-
---
-## 3. 系统架构
+## 3. Architecture
-### 数据流图
+### Dataflow Diagram
+### Layer Overview
-### 各层说明
-
-**API 层**
-- `/human`: 接收文本,支持 echo(直接复读)和 chat(LLM 对话)模式
-- `/humanaudio`: 接收音频文件直接播放
-- 每个连接分配唯一 `sessionid`,支持多用户并发
+**API Layer**
+- `/human`: Accepts text, supporting echo (direct playback) and chat (LLM conversation) modes
+- `/humanaudio`: Accepts audio files for direct playback
+- Each connection is assigned a unique `sessionid`, supporting multi-user concurrency
-**逻辑层**
-- **LLM 引擎**: 对接 Qwen 等大模型生成对话回复
-- **TTS 引擎**: 模块化设计,支持 EdgeTTS、GPT-SoVITS、CosyVoice、腾讯云等多种方案
-- **特征提取**: 同步提取音频的声学特征(如 Mel 频谱),用于口型推理
+**Logic Layer**
+- **LLM Engine**: Integrates with models like Qwen to generate conversational responses
+- **TTS Engine**: Modular design supporting EdgeTTS, GPT-SoVITS, CosyVoice, Tencent Cloud, and more
+- **Feature Extraction**: Synchronously extracts acoustic features (e.g., Mel spectrograms) for lip-sync inference
-**渲染层**
-- **模型推理**: 使用深度学习模型 (Wav2Lip, MuseTalk 等) 根据音频特征生成口型画面
-- **后处理**: 将生成的口型区域平滑贴回原始高清视频
+**Rendering Layer**
+- **Model Inference**: Uses deep learning models (Wav2Lip, MuseTalk, etc.) to generate lip-sync frames from audio features
+- **Post-Processing**: Smoothly overlays the generated mouth region back onto the original high-definition video
-**推流层**
-- **WebRTC**: 低延迟浏览器端推流
-- **RTMP**: 标准直播协议,支持推流到 B站/YouTube 等平台
-- **虚拟摄像头**: 输出为系统摄像头设备
+**Streaming Layer**
+- **WebRTC**: Low-latency browser-based streaming
+- **RTMP**: Standard live streaming protocol, supports pushing to platforms like Bilibili/YouTube
+- **Virtual Camera**: Outputs as a system camera device
-**插件系统**
-- 基于 [registry.py](registry.py) 的去中心化注册机制,开发者可自行扩展 TTS、Avatar、Output 模块
+**Plugin System**
+- Decentralized registration mechanism based on [registry.py](registry.py), allowing developers to extend TTS, Avatar, and Output modules
---
-## 4. API 接口
+## 4. API Documentation
-| 文档 | 说明 |
-|------|------|
-| [docs/api.md](docs/api.md) | 通用业务 API — WebRTC、文本/音频驱动、录制、动作编排 |
-| [docs/avatar_api.md](docs/avatar_api.md) | Avatar 生成 API — 创建任务、查询进度、删除任务 |
-| [docs/admin_api.md](docs/admin_api.md) | Admin 管理 API — 全局配置、会话监控、强制停止 |
+| Document | Description |
+|----------|-------------|
+| [docs/api.md](docs/api.md) | General API — WebRTC, text/audio driver, recording, action choreography |
+| [docs/avatar_api.md](docs/avatar_api.md) | Avatar Generation API — create tasks, query progress, delete tasks |
+| [docs/admin_api.md](docs/admin_api.md) | Admin API — global config, session monitoring, force stop |
---
-## 5. Docker 运行
+## 5. Docker
-镜像说明:
-- **AutoDL**: — [教程](https://doc.livetalking.ai/docs/autodl/)
-- **UCloud**: — 支持开放任意端口,无需额外部署 SRS — [教程](https://doc.livetalking.ai/docs/ucloud/)
+Available images:
+- **AutoDL**: — [Tutorial](https://doc.livetalking.ai/en/docs/autodl/)
+- **UCloud**: — Supports opening any port, no additional SRS deployment required — [Tutorial](https://doc.livetalking.ai/en/docs/ucloud/)
-> AutoDL 由于不能开放 UDP 端口,需自行部署 SRS 或 TURN 转发服务。
+> AutoDL cannot open UDP ports, so you need to deploy SRS or TURN relay service separately.
---
-## 6. 性能指标
+## 6. Performance
-- 每路视频压缩消耗 CPU,分辨率越高 CPU 消耗越大;每路口型推理消耗 GPU
-- 不说话时并发数取决于 CPU,同时说话并发数取决于 GPU
-- 后端日志 `inferfps` = GPU 推理帧率, `finalfps` = 最终推流帧率,两者均需 >=25 才算实时
+- Each video stream compression consumes CPU; higher resolution means greater CPU usage. Each lip-sync inference consumes GPU
+- Concurrent sessions when not speaking depend on CPU; concurrent speaking sessions depend on GPU
+- In backend logs: `inferfps` = GPU inference frame rate, `finalfps` = final streaming frame rate. Both must be >= 25 for real-time performance
-### 实时推理性能
+### Real-Time Inference Performance
-| 模型 | 显卡 | FPS |
-|:------|:------|:----|
+| Model | GPU | FPS |
+|:------|:----|:----|
| wav2lip256 | RTX 3060 | 60 |
| wav2lip256 | RTX 3080Ti | 120 |
| musetalk | RTX 3080Ti | 42 |
| musetalk | RTX 3090 | 45 |
| musetalk | RTX 4090 | 72 |
-- wav2lip256 推荐 RTX 3060 及以上
-- musetalk 推荐 RTX 3080Ti 及以上
+- wav2lip256: RTX 3060 or higher recommended
+- musetalk: RTX 3080Ti or higher recommended
---
-## 7. 声明
+## 7. Statement
-基于本项目开发并发布在B站、视频号、抖音等平台上的视频需带上 LiveTalking 水印和标识。
+Videos developed based on this project and published on platforms such as Bilibili, WeChat Channels, and Douyin must include the LiveTalking watermark and logo.
---
-如果本项目对你有帮助,帮忙点个 Star。也欢迎感兴趣的朋友一起来完善该项目。
+If this project is helpful to you, please give it a Star. Contributors interested in improving this project are also welcome.
-| 社区 | 链接 |
-|------|------|
-| 知识星球 | |
-| 微信 | wxwubug (加群请备注) |
+| Community | Link |
+|-----------|------|
+| Knowledge Planet | |
+| WeChat | wxwubug (mention for group invite) |
| Telegram | |
| Discord | |
| Email | lipku@foxmail.com |
-| 微信公众号 | 数字人技术 |
+| WeChat Official | 数字人技术 |
diff --git a/app.py b/app.py
index ce496206..3a287cff 100644
--- a/app.py
+++ b/app.py
@@ -194,7 +194,7 @@ def main():
elif opt.transport=='rtcpush':
pagename='rtcpushapi.html'
logger.info('start http server; http://:'+str(opt.listenport)+'/'+pagename)
- # logger.info('如果使用webrtc,推荐访问webrtc集成前端: http://:'+str(opt.listenport)+'/dashboard.html')
+ # logger.info('If using WebRTC, it is recommended to access the WebRTC integrated frontend: http://:'+str(opt.listenport)+'/dashboard.html')
def run_server(runner):
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
diff --git a/llm.py b/llm.py
index 9be5e19f..4f5f2569 100644
--- a/llm.py
+++ b/llm.py
@@ -8,51 +8,57 @@
def llm_response(message,avatar_session:'BaseAvatar',datainfo:dict={}):
try:
opt = avatar_session.opt
- start = time.perf_counter()
- from openai import OpenAI
- client = OpenAI(
- # 如果您没有配置环境变量,请在此处用您的API Key进行替换
- api_key=os.getenv("DASHSCOPE_API_KEY"),
- # 填写DashScope SDK的base_url
- base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
- )
- end = time.perf_counter()
- logger.info(f"llm Time init: {end-start}s,{message}")
- completion = client.chat.completions.create(
- model="qwen-plus",
- messages=[{'role': 'system', 'content': '你是一个知识助手,尽量以简短、口语化的方式输出'},
- {'role': 'user', 'content': message}],
- stream=True,
- # 通过以下设置,在流式输出的最后一行展示token使用信息
- stream_options={"include_usage": True}
- )
- result=""
- first = True
- for chunk in completion:
- if len(chunk.choices)>0:
- #print(chunk.choices[0].delta.content)
- if first:
- end = time.perf_counter()
- logger.info(f"llm Time to first chunk: {end-start}s")
- first = False
- msg = chunk.choices[0].delta.content
- if msg is None:
- continue
- lastpos=0
- #msglist = re.split('[,.!;:,。!?]',msg)
- for i, char in enumerate(msg):
- if char in ",.!;:,。!?:;" :
- result = result+msg[lastpos:i+1]
- lastpos = i+1
- if len(result)>10:
- logger.info(result)
- avatar_session.put_msg_txt(result,datainfo)
- result=""
- result = result+msg[lastpos:]
- end = time.perf_counter()
- logger.info(f"llm Time to last chunk: {end-start}s")
- if result:
- avatar_session.put_msg_txt(result,datainfo)
+ # Static response to avoid using paid third-party services
+ static_response = f"收到,这是本地静态测试回复。你发送的消息是:{message}"
+ logger.info(f"Static LLM response: {static_response}")
+ avatar_session.put_msg_txt(static_response, datainfo)
+ return
+
+ # start = time.perf_counter()
+ # from openai import OpenAI
+ # client = OpenAI(
+ # # 如果您没有配置环境变量,请在此处用您的API Key进行替换
+ # api_key=os.getenv("DASHSCOPE_API_KEY"),
+ # # 填写DashScope SDK的base_url
+ # base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
+ # )
+ # end = time.perf_counter()
+ # logger.info(f"llm Time init: {end-start}s,{message}")
+ # completion = client.chat.completions.create(
+ # model="qwen-plus",
+ # messages=[{'role': 'system', 'content': '你是一个知识助手,尽量以简短、口语化的方式输出'},
+ # {'role': 'user', 'content': message}],
+ # stream=True,
+ # # 通过以下设置,在流式输出的最后一行展示token使用信息
+ # stream_options={"include_usage": True}
+ # )
+ # result=""
+ # first = True
+ # for chunk in completion:
+ # if len(chunk.choices)>0:
+ # #print(chunk.choices[0].delta.content)
+ # if first:
+ # end = time.perf_counter()
+ # logger.info(f"llm Time to first chunk: {end-start}s")
+ # first = False
+ # msg = chunk.choices[0].delta.content
+ # if msg is None:
+ # continue
+ # lastpos=0
+ # #msglist = re.split('[,.!;:,。!?]',msg)
+ # for i, char in enumerate(msg):
+ # if char in ",.!;:,。!?:;" :
+ # result = result+msg[lastpos:i+1]
+ # lastpos = i+1
+ # if len(result)>10:
+ # logger.info(result)
+ # avatar_session.put_msg_txt(result,datainfo)
+ # result=""
+ # result = result+msg[lastpos:]
+ # end = time.perf_counter()
+ # logger.info(f"llm Time to last chunk: {end-start}s")
+ # if result:
+ # avatar_session.put_msg_txt(result,datainfo)
except Exception as e:
logger.exception('llm exceptiopn:')
diff --git a/tts/doubao.py b/tts/doubao.py
index 6c1b84e5..a5305878 100644
--- a/tts/doubao.py
+++ b/tts/doubao.py
@@ -49,54 +49,59 @@ def __init__(self, opt, parent):
}
async def doubao_voice(self, text, ref_file): # -> Iterator[bytes]:
- start = time.perf_counter()
- voice_type = ref_file #self.opt.REF_FILE
+ # Mock/static return to avoid using paid Doubao service
+ logger.info(f"Mock Doubao TTS voice synthesis for text: {text}")
+ yield b'\x00' * 51200
+ return
- try:
- # 创建请求对象
- default_header = bytearray(b'\x11\x10\x11\x00')
- submit_request_json = copy.deepcopy(self.request_json)
- submit_request_json["user"]["uid"] = self.parent.sessionid
- submit_request_json["audio"]["voice_type"] = voice_type
- submit_request_json["request"]["text"] = text
- submit_request_json["request"]["reqid"] = str(uuid.uuid4())
- submit_request_json["request"]["operation"] = "submit"
- payload_bytes = str.encode(json.dumps(submit_request_json))
- payload_bytes = gzip.compress(payload_bytes) # if no compression, comment this line
- full_client_request = bytearray(default_header)
- full_client_request.extend((len(payload_bytes)).to_bytes(4, 'big')) # payload size(4 bytes)
- full_client_request.extend(payload_bytes) # payload
-
- header = {"Authorization": f"Bearer; {self.token}"}
- first = True
- async with websockets.connect(self.api_url, extra_headers=header, ping_interval=None) as ws:
- await ws.send(full_client_request)
- while True:
- res = await ws.recv()
- header_size = res[0] & 0x0f
- message_type = res[1] >> 4
- message_type_specific_flags = res[1] & 0x0f
- payload = res[header_size*4:]
-
- if message_type == 0xb: # audio-only server response
- if message_type_specific_flags == 0: # no sequence number as ACK
- #print(" Payload size: 0")
- continue
- else:
- if first:
- end = time.perf_counter()
- logger.info(f"doubao tts Time to first chunk: {end-start}s")
- first = False
- sequence_number = int.from_bytes(payload[:4], "big", signed=True)
- payload_size = int.from_bytes(payload[4:8], "big", signed=False)
- payload = payload[8:]
- yield payload
- if sequence_number < 0:
- break
- else:
- break
- except Exception as e:
- logger.exception('doubao')
+ # start = time.perf_counter()
+ # voice_type = ref_file #self.opt.REF_FILE
+ #
+ # try:
+ # # 创建请求对象
+ # default_header = bytearray(b'\x11\x10\x11\x00')
+ # submit_request_json = copy.deepcopy(self.request_json)
+ # submit_request_json["user"]["uid"] = self.parent.sessionid
+ # submit_request_json["audio"]["voice_type"] = voice_type
+ # submit_request_json["request"]["text"] = text
+ # submit_request_json["request"]["reqid"] = str(uuid.uuid4())
+ # submit_request_json["request"]["operation"] = "submit"
+ # payload_bytes = str.encode(json.dumps(submit_request_json))
+ # payload_bytes = gzip.compress(payload_bytes) # if no compression, comment this line
+ # full_client_request = bytearray(default_header)
+ # full_client_request.extend((len(payload_bytes)).to_bytes(4, 'big')) # payload size(4 bytes)
+ # full_client_request.extend(payload_bytes) # payload
+ #
+ # header = {"Authorization": f"Bearer; {self.token}"}
+ # first = True
+ # async with websockets.connect(self.api_url, extra_headers=header, ping_interval=None) as ws:
+ # await ws.send(full_client_request)
+ # while True:
+ # res = await ws.recv()
+ # header_size = res[0] & 0x0f
+ # message_type = res[1] >> 4
+ # message_type_specific_flags = res[1] & 0x0f
+ # payload = res[header_size*4:]
+ #
+ # if message_type == 0xb: # audio-only server response
+ # if message_type_specific_flags == 0: # no sequence number as ACK
+ # #print(" Payload size: 0")
+ # continue
+ # else:
+ # if first:
+ # end = time.perf_counter()
+ # logger.info(f"doubao tts Time to first chunk: {end-start}s")
+ # first = False
+ # sequence_number = int.from_bytes(payload[:4], "big", signed=True)
+ # payload_size = int.from_bytes(payload[4:8], "big", signed=False)
+ # payload = payload[8:]
+ # yield payload
+ # if sequence_number < 0:
+ # break
+ # else:
+ # break
+ # except Exception as e:
+ # logger.exception('doubao')
# # 检查响应状态码
# if response.status_code == 200:
# # 处理响应数据
diff --git a/tts/qwentts.py b/tts/qwentts.py
index 1b69edaa..25cb00f6 100644
--- a/tts/qwentts.py
+++ b/tts/qwentts.py
@@ -45,16 +45,6 @@ def __init__(self, opt, parent):
self.voice = opt.REF_FILE if opt.REF_FILE else 'Cherry'
# 模型名
self.model = getattr(opt, 'qwen_tts_model', 'qwen3-tts-flash-realtime')
- # WebSocket URL
- self.ws_url = getattr(opt, 'qwen_tts_url',
- 'wss://dashscope.aliyuncs.com/api-ws/v1/realtime')
-
- # 设置 DashScope API Key
- api_key = getattr(opt, 'dashscope_api_key', None) or os.environ.get('DASHSCOPE_API_KEY')
- if api_key:
- dashscope.api_key = api_key
- else:
- logger.warning("QwenTTS: DASHSCOPE_API_KEY 未设置,请设置环境变量或通过参数传入")
# ---------- 内部状态 ----------
self._remainder = np.array([], dtype=np.float32) # 上次重采样后不足一 chunk 的 16kHz 样本
@@ -63,94 +53,35 @@ def __init__(self, opt, parent):
self._current_text = ''
self._current_textevent = {}
- # ---------- 回调类 ----------
- tts_ref = self
-
- class _Callback(QwenTtsRealtimeCallback):
- def on_open(self) -> None:
- logger.info("QwenTTS WebSocket 连接已建立")
-
- def on_close(self, close_status_code, close_msg) -> None:
- logger.info(f"QwenTTS WebSocket 关闭: code={close_status_code}, msg={close_msg}")
- tts_ref._response_event.set()
-
- def on_event(self, response: dict) -> None:
- try:
- event_type = response.get('type', '')
-
- if event_type == 'session.created':
- logger.info(f"QwenTTS session: {response.get('session', {}).get('id', '')}")
-
- elif event_type == 'response.audio.delta':
- audio_b64 = response.get('delta', '')
- if audio_b64:
- pcm_data = base64.b64decode(audio_b64)
- tts_ref._on_audio_data(pcm_data)
-
- elif event_type == 'response.done':
- logger.info("QwenTTS response done")
- tts_ref._flush_remainder()
- tts_ref._response_event.set()
-
- elif event_type == 'error':
- logger.error(f"QwenTTS 错误: {response}")
- tts_ref._response_event.set()
-
- except Exception as e:
- logger.exception(f"QwenTTS 回调处理异常: {e}")
-
- # ---------- 建立唯一连接 ----------
- self._callback = _Callback()
- self._tts_client = QwenTtsRealtime(
- model=self.model,
- callback=self._callback,
- url=self.ws_url,
- )
- self._tts_client.connect()
- self._tts_client.update_session(
- voice=self.voice,
- response_format=AudioFormat.PCM_24000HZ_MONO_16BIT, # Qwen TTS 只支持 24kHz 输出
- sample_rate=16000,
- mode='commit',
- )
- logger.info(f"QwenTTS 初始化完成: model={self.model}, voice={self.voice}")
+ logger.info("Mock QwenTTS initialized (no remote API connection established)")
# ========================== 核心方法 ==========================
def txt_to_audio(self, msg: tuple[str, dict]):
- text, textevent = msg
- t_start = time.perf_counter()
-
- ref_file = textevent.get('tts', {}).get('ref_file',self.opt.REF_FILE)
-
- # 重置状态
- self._remainder = np.array([], dtype=np.float32)
- self._first_chunk = True
- self._current_text = text
- self._current_textevent = textevent
- self._response_event.clear()
-
try:
- #logger.info(f"QwenTTS 发送文本: {text[:80]}...")
- if ref_file != self.voice:
- logger.info(f'ref_file:{ref_file},self.voice:{self.voice}')
- self.voice=ref_file
- self._tts_client.close()
- self._tts_client.connect()
- self._tts_client.update_session(
- voice=self.voice,
- response_format=AudioFormat.PCM_24000HZ_MONO_16BIT, # Qwen TTS 只支持 24kHz 输出
- sample_rate=16000,
- mode='commit',
- )
- self._tts_client.append_text(text)
- self._tts_client.commit()
-
- # 等待 response.done(音频在回调中流式处理)
- self._response_event.wait(timeout=60)
+ text, textevent = msg
+ t_start = time.perf_counter()
+
+ logger.info(f"Mock QwenTTS synthesis for text: {text}")
+
+ # Output start frame
+ eventpoint_start = {'status': 'start', 'text': text}
+ eventpoint_start.update(**textevent)
+ self.parent.put_audio_frame(np.zeros(self.chunk, np.float32), eventpoint_start)
+
+ # Output mock silence
+ for _ in range(10):
+ if self.state != State.RUNNING:
+ break
+ self.parent.put_audio_frame(np.zeros(self.chunk, np.float32), textevent)
+
+ # Output end frame
+ eventpoint_end = {'status': 'end', 'text': text}
+ eventpoint_end.update(**textevent)
+ self.parent.put_audio_frame(np.zeros(self.chunk, np.float32), eventpoint_end)
t_end = time.perf_counter()
- logger.info(f"QwenTTS 合成完成,耗时: {t_end - t_start:.2f}s")
+ logger.info(f"Mock QwenTTS synthesis completed, time: {t_end - t_start:.2f}s")
except Exception as e:
logger.exception(f"QwenTTS txt_to_audio 异常: {e}")
diff --git a/tts/tencent.py b/tts/tencent.py
index b283596f..4584cd1c 100644
--- a/tts/tencent.py
+++ b/tts/tencent.py
@@ -78,41 +78,46 @@ def txt_to_audio(self,msg:tuple[str, dict]):
)
def tencent_voice(self, text, reffile, reftext,language, server_url) -> Iterator[bytes]:
- start = time.perf_counter()
- session_id = str(uuid.uuid1())
- params = self.__gen_params(session_id, text, reffile)
- signature = self.__gen_signature(params)
- headers = {
- "Content-Type": "application/json",
- "Authorization": str(signature)
- }
- url = _PROTOCOL + _HOST + _PATH
- try:
- res = requests.post(url, headers=headers,
- data=json.dumps(params), stream=True)
-
- end = time.perf_counter()
- logger.info(f"tencent Time to make POST: {end-start}s")
-
- first = True
-
- for chunk in res.iter_content(chunk_size=6400): # 640 16K*20ms*2
- #logger.info('chunk len:%d',len(chunk))
- if first:
- try:
- rsp = json.loads(chunk)
- #response["Code"] = rsp["Response"]["Error"]["Code"]
- #response["Message"] = rsp["Response"]["Error"]["Message"]
- logger.error("tencent tts:%s",rsp["Response"]["Error"]["Message"])
- return
- except:
- end = time.perf_counter()
- logger.info(f"tencent Time to first chunk: {end-start}s")
- first = False
- if chunk and self.state==State.RUNNING:
- yield chunk
- except Exception as e:
- logger.exception('tencent')
+ # Mock/static return to avoid using paid Tencent service
+ logger.info(f"Mock Tencent TTS voice synthesis for text: {text}")
+ yield b'\x00' * 51200
+ return
+
+ # start = time.perf_counter()
+ # session_id = str(uuid.uuid1())
+ # params = self.__gen_params(session_id, text, reffile)
+ # signature = self.__gen_signature(params)
+ # headers = {
+ # "Content-Type": "application/json",
+ # "Authorization": str(signature)
+ # }
+ # url = _PROTOCOL + _HOST + _PATH
+ # try:
+ # res = requests.post(url, headers=headers,
+ # data=json.dumps(params), stream=True)
+ #
+ # end = time.perf_counter()
+ # logger.info(f"tencent Time to make POST: {end-start}s")
+ #
+ # first = True
+ #
+ # for chunk in res.iter_content(chunk_size=6400): # 640 16K*20ms*2
+ # #logger.info('chunk len:%d',len(chunk))
+ # if first:
+ # try:
+ # rsp = json.loads(chunk)
+ # #response["Code"] = rsp["Response"]["Error"]["Code"]
+ # #response["Message"] = rsp["Response"]["Error"]["Message"]
+ # logger.error("tencent tts:%s",rsp["Response"]["Error"]["Message"])
+ # return
+ # except:
+ # end = time.perf_counter()
+ # logger.info(f"tencent Time to first chunk: {end-start}s")
+ # first = False
+ # if chunk and self.state==State.RUNNING:
+ # yield chunk
+ # except Exception as e:
+ # logger.exception('tencent')
def stream_tts(self,audio_stream,msg:tuple[str, dict]):
text,textevent = msg
diff --git a/web/admin.html b/web/admin.html
index 534ae389..6ebd8406 100644
--- a/web/admin.html
+++ b/web/admin.html
@@ -1,10 +1,10 @@
-
+
- 后台管理系统 - LiveTalking
+ Admin Console - LiveTalking
@@ -119,17 +119,9 @@
}
@keyframes pulse {
- 0% {
- box-shadow: 0 0 0 0 rgba(239, 68, 68, 0.4);
- }
-
- 70% {
- box-shadow: 0 0 0 6px rgba(239, 68, 68, 0);
- }
-
- 100% {
- box-shadow: 0 0 0 0 rgba(239, 68, 68, 0);
- }
+ 0% { box-shadow: 0 0 0 0 rgba(239, 68, 68, 0.4); }
+ 70% { box-shadow: 0 0 0 6px rgba(239, 68, 68, 0); }
+ 100% { box-shadow: 0 0 0 0 rgba(239, 68, 68, 0); }
}
.session-detail-item {
@@ -159,7 +151,7 @@
}
.skeleton {
- background: linear-gradient(90deg, #f0f0f0 25%, #e0e0e0 50%, #f0f0f0 75%);
+ background: linear-gradient(90deg, #f0f0f0 25%, #e0eafc 50%, #f0f0f0 75%);
background-size: 200% 100%;
animation: loading 1.5s infinite;
border-radius: 4px;
@@ -167,13 +159,8 @@
}
@keyframes loading {
- 0% {
- background-position: 200% 0;
- }
-
- 100% {
- background-position: -200% 0;
- }
+ 0% { background-position: 200% 0; }
+ 100% { background-position: -200% 0; }
}
.empty-state {
@@ -194,9 +181,9 @@