摘要:本文主要通过ATOMS3R-CAM AI CHATBOT开发板,通过板载的语音底座,实现一个可以对多语种音频(100+)进行自动识别,然后在串口上显示识别到的音频文字及其对应的译文,并将其内容通过板载扬声器进行播放;

一、硬件介绍
产品特点
AtomS3R-CAM AI Chatbot 套件由控制器与语音底座两大核心部分组成;
控制器部分采用AtomS3R-CAM (集成0.3MP GC0308摄像头、9轴IMU、IR发射管等);
语音底座部分采用Atomic Voice Base (基于ES831音频编解码芯片);

主要特性
集成 ESP32-S3-PICO-1-N8R8 主控
0.3MP GC0308 摄像头
九轴传感器系统
8MB Flash 和 8MB PSRAM
集成红外发射管
可扩展的引脚与接口
全双工 I2S 语音
MEMS 数字麦克风
功能引脚示意图

二、功能实现
1、硬件介绍
Atomic Voice Base 是一款专为 M5 Atom 系列主机设计的语音识别底座,采用了ES8311单声道音频解码器、MEMS麦克风和NS4150B功率放大器,该设备支持全双工通信;


主要原理图
MIC / Speaker:Atomic Voice Base 麦克风支持的采样率范围为 16KHz-64KHz;

主要特性

2、功能设计思想
项目主要功能介绍:通过语音底座的麦克风采集(中 / 英 / 日)等多语种音频,然后将采集到的音频数据通过ESP32S3 WiFi功能传输至讯飞平台的相关功能API接口;
首先将识别到的语音转换成对应的文字内容,再通过调用翻译接口将源语种文字翻译成目标文本(zh);
最后通过语音底座的扬声器,播放翻译后的TTS结果(同时在串口上显示源语种识别文本、翻译文本等);

3、功能实现
3.1、功能服务开通
使用的是讯飞开放平台的相关接口功能;
(需注册并开通相关服务以获取APPID / KEY等)
注册完成后打开控制台并创建新应用名称,用于后续添加相关不同的功能;
(左侧为不同功能的主要类目 / 均可免费使用)


3.2、主要服务功能作用
1、语音识别(IAT):通过硬件底座麦克风录制用户的语音输入,将多语种音频(≤60秒)精准识别成文字,实时返回文字结果,并在串口上进行显示;
2、语种翻译(NiuTrans):将语音识别到的文字内容,通过机器翻译后,将译文内容在串口上进行显示;
3、语音合成(TTS):对译文进行语音合成,并通过硬件底座扬声器进行语音播报;
3.3、功能服务API介绍
接口介绍:将多语种短音频(≤60秒)精准识别成文字,实时返回文字结果,真实还原语音内容;

接口要求:

支持语种:
中文(汉语)、英文(英语)、日语、韩语、俄语、法语、西班牙语、阿拉伯语、德语、泰语、越南语、印地语、葡萄牙语、意大利语、马来语、印尼语(印度尼西亚)、菲律宾语(塔加洛语)、土耳其语、希腊语、捷克语、乌尔都语、孟加拉语、泰米尔语、乌克兰语、哈萨克语、乌兹别克语、波兰语、蒙语(新蒙)、斯瓦西里语(斯瓦希里语)、豪撒语(豪萨语)、波斯语、荷兰语、瑞典语、罗马尼亚语、保加利亚语、维语、藏语、亚美尼亚语、僧伽罗语、缅甸语、老挝语、希伯来语、尼泊尔语、普什图语、塔吉克语、土库曼语、格鲁吉亚语、阿塞拜疆语、塞尔维亚语、匈牙利语、高棉语、克罗地亚语、立陶宛语、斯洛伐克语、拉脱维亚语、斯洛文尼亚语、丹麦语、芬兰语、挪威语、泰卢固语、爪哇语、加泰罗尼亚语、冰岛语、阿姆哈拉语、南非荷兰语、马拉雅拉姆语、巽他语、祖鲁语、马拉地语、蒙古语 - 内蒙、朝鲜族语、卢旺达语、卡比尔语、卢甘达语、白俄罗斯语、彝语、威尔士语、巴斯克语、鞑靼语、阿萨姆语、库尔迪什(库尔德语)、马其顿语、爱沙尼亚语、马达加斯加语、索马里语、约鲁巴语、巴什基尔语、海地语、林加拉语、信德省语、马耳他语、卢森堡语、阿尔巴尼亚语、加利西亚语、吉尔吉斯语、古吉拉特语、卡纳达语、旁遮普语、爱尔兰语、博杰普尔语、弗拉芒语;
接口介绍:基于小牛翻译自主研发的多语种机器翻译引擎,支持包括英、日、韩、法、西、俄等100多种语言,将源语种文字转化为目标语种文字;

接口要求:

支持语种:
中文(简体)、中文(繁体)、英语、日语、韩语、俄语、法语、西班牙语、阿拉伯语、葡萄牙语、南非荷兰语、阿姆哈拉语、阿塞拜疆语、巴什基尔语、白俄罗斯语、别姆巴语、保加利亚语、比斯拉马语、孟加拉语、波斯尼亚语、加泰罗尼亚语、宿务语、科西嘉语、塞舌尔克里奥尔语、捷克语、威尔士语、丹麦语、德语、埃维语、希腊语、世界语、爱沙尼亚语、巴斯克语、波斯语、芬兰语、菲律宾语、斐济语、弗里西语、爱尔兰语、苏格兰盖尔语、加利西亚、古吉拉特语、豪萨语、夏威夷语、希伯来语、印地语、克罗地亚语、海地克里奥尔语、匈牙利语、亚美尼亚语、印尼语、伊博语、冰岛语、意大利语、印尼爪哇语、格鲁吉亚语、哈萨克语、凯克其语、刚果语、哈萨克语(西里尔)、高棉语、卡纳达语、库尔德语、吉尔吉斯语、拉丁语、卢森堡语、卢干达语、林加拉语、老挝语、立陶宛语、拉脱维亚语、马尔加什语、马里语、毛利语、马其顿语、马拉雅拉姆语、蒙古语(西里尔)、马拉地语、山地马里语、马来语、马耳他语、白苗文、缅甸语、博克马尔语、尼泊尔语、荷兰语、挪威语、齐切瓦语、奥罗莫语、奥赛梯语、克雷塔罗奥托米语、旁遮普语、帕皮阿门托语、波兰语、普什图语、隆迪语、罗马尼亚语、卢旺达语、信德语、桑戈语、僧伽罗语、斯洛伐克语、斯洛文尼亚语、萨摩亚语、修纳语、索马里语、阿尔巴尼亚语、塞尔维亚语、塞索托语、印尼巽他语、瑞典语、斯瓦希里语、泰米尔语、泰卢固语、塔吉克语、茨瓦纳语、泰语、藏语、提格雷语、土库曼语、汤加语、巴布亚皮钦语、土耳其语、聪加语、鞑靼语、契维语、塔希提语、乌德穆尔特语、乌克兰语、乌尔都语、维吾尔语、乌兹别克语、越南语、瓦瑞语、南非科萨语、意第绪语、约鲁巴语、尤卡坦玛雅语、广东话、南非祖鲁语;
接口介绍:长文本语音合成提供了支持单次超大文本(万字级别)进行快速语音合成的功能。
支持单次合成上限约10万字符;
语音合成速度快,支持设置语速、语调和音量等特性;
支持中、英文,男、女声发音人;
支持输出pcm、mp3、speex、opus等编码格式音频;
支持通过主动查询和服务回调方式获取语音合成结果;
支持拼音标注功能;

接口要求:

合成文本语言:
zh:中文(默认)
en:英文
发音人选择:
可在控制台页面进行发音人管理,添加相关不同声音效果的发音人;



3.4、实物效果
实物搭建效果

主要运行流程图

三、代码编写
语音识别(IAT)
参数配置:
static const char *IAT_HOST = "iat.cn-huabei-1.xf-yun.com";
static const uint16_t IAT_PORT = 443;
static const char *IAT_PATH = "/v1";
static const char *IAT_DOMAIN = "slm";
static const char *IAT_LANGUAGE = "mul_cn";
static const char *IAT_ACCENT = "mandarin";
static const int IAT_EOS_MS = 2000;
鉴权 URL 构建: rfc1123_gmt → hmacSha256Base64 → urlEncode → buildAuthUrl()
WebSocket 连接:ws.setReconnectInterval(500) → ws.beginSslWithCA() → ws.onEvent()
录音发送循环: echobase.record() → stereo_to_mono_left() → base64EncodeFixed() → snprintf (拼JSON) → ws.sendTXT() (发送音频帧)
结果接收: WStype_TEXT → parseResponse() → base64DecodeToString() → extractWsCwText() → finalText += (累加文字内容)
文本翻译(NiuTrans)
参数配置:
static const char *NT_HOST = "ntrans.xfyun.cn";
static const char *NT_PATH = "/v2/ots";
static const char *NT_FROM = "auto"; // 自动识别源文本
static const char *NT_TO = "cn"; // 目标语言 中文
请求体构建: mbedtls_base64_encode(text) → snprintf (拼JSON reqBody)
鉴权头部: rfc1123_gmt → sha256Base64(reqBody) → hmacSha256Base64(sigOrigin) → authHeader
HTTP POST: WiFiClientSecure.connect() → readStringUntil('\n') (解析Content-Length) → readBytes()
结果解析: deserializeJson() → doc["data"]["result"]["trans_result"]["dst"] → (返回译文)
语音播报(TTS)
参数配置:
static const char *TTS_HOST = "api-dx.xf-yun.com";
static const char *TTS_CREATE = "/v1/private/dts_create";
static const char *TTS_QUERY = "/v1/private/dts_query";
static const char *TTS_VCN = "x4_yeting"; // 发音人(可在控制台替换)
创建任务:mbedtls_base64_encode(text) → snprintf (拼JSON createBody) → ttsHttpPost(TTS_CREATE) → (解析task_id)
轮询任务:ttsHttpPost(TTS_QUERY) → payload.audio.audio (下载URL)
下载 PCM: mbedtls_base64_decode(url) → WiFiClientSecure.connect() → dc.printf("GET ...") → dc.readBytes()
音频播放: echobase.setSpeakerVolume() → echobase.play()
主要相关代码
音频录制 / 播放相关参数:16kHz、16bit、PCM
#include <M5Unified.h>
#include <M5EchoBase.h>
#include <WiFi.h>
#include <WebSocketsClient.h>
#include <ArduinoJson.h>
#include <time.h>
#include "mbedtls/md.h"
#include "mbedtls/base64.h"
// ---------------- WiFi ----------------
static const char* WIFI_SSID = "WIFI名称";
static const char* WIFI_PASS = "WIFI密码";
// ================== 讯飞API账号 ==================
static const char* XFYUN_APPID = "621e12ac";
static const char* XFYUN_APIKEY = "227b6123cc9ee3fd8b4e2c";
static const char* XFYUN_APISECRET = "M1CxODU1Njc0ZWQ3YmFlO1cc2";
#define DEBUG_LOG 0
#if DEBUG_LOG
#define DPRINT(...) Serial.print(__VA_ARGS__)
#define DPRINTF(...) Serial.printf(__VA_ARGS__)
#define DPRINTLN(...) Serial.println(__VA_ARGS__)
#else
#define DPRINT(...)
#define DPRINTF(...)
#define DPRINTLN(...)
#endif
// --- IAT (语音识别) ---
static const char *IAT_HOST = "iat.cn-huabei-1.xf-yun.com";
static const uint16_t IAT_PORT = 443;
static const char *IAT_PATH = "/v1";
static const char *IAT_DOMAIN = "slm";
static const char *IAT_LANGUAGE = "mul_cn";
static const char *IAT_ACCENT = "mandarin";
static const int IAT_EOS_MS = 2000; // 静音多少毫秒停止识别
static const char *RESULT_ENCODING = "utf8";
static const char *RESULT_COMPRESS = "raw";
static const char *RESULT_FORMAT = "json";
// --- NiuTrans (机器翻译) ---
static const char *NT_HOST = "ntrans.xfyun.cn";
static const char *NT_PATH = "/v2/ots";
static const char *NT_FROM = "auto"; // 源语种 (自动识别)
static const char *NT_TO = "cn"; // 目标语种
// --- TTS (语音合成) ---
static const char *TTS_HOST = "api-dx.xf-yun.com";
static const char *TTS_CREATE = "/v1/private/dts_create"; // 创建任务
static const char *TTS_QUERY = "/v1/private/dts_query"; // 查询任务
static const char *TTS_VCN = "x4_yeting"; // 发音人(可选)
// ================== Audio ==================
#define SAMPLE_RATE 16000
static constexpr uint32_t CHUNK_MS = 40;
static constexpr uint32_t MAX_MS = 60000;
static constexpr size_t STEREO_BYTES = (SAMPLE_RATE * 4 * CHUNK_MS) / 1000; // stereo16
static constexpr size_t MONO_BYTES = (SAMPLE_RATE * 2 * CHUNK_MS) / 1000; // mono16 (1280)
static constexpr size_t B64_CAP = 1720;
M5EchoBase echobase;
WebSocketsClient ws;
static volatile bool wsConnected = false;
static volatile bool sessionDone = false;
static String finalText;
// 发送 JSON 的固定缓冲
static char sendBuf[4096];
// ================== NTP时间同步 ==================
static void syncTimeNtpOrDie()
{
...
}
// RFC1123 时间格式
static String rfc1123_gmt(time_t t)
{
...
}
// URL 编码
static String urlEncode(const String &s)
{
...
return out;
}
// ================== HMAC-SHA256签名 / Base64编码 / Base64解码 ==================
static String hmacSha256Base64(const String &key, const String &msg)
{
...
}
static bool base64EncodeFixed(const uint8_t *data, size_t len, char *out, size_t outCap)
{
...
}
static bool base64DecodeToString(const char *b64, String &out)
{
...
}
static void stereo_to_mono_left(const uint8_t *inStereo, uint8_t *outMono, size_t stereoBytes)
{
const int16_t *s = (const int16_t *)inStereo;
int16_t *d = (int16_t *)outMono;
size_t frames = stereoBytes / 4;
for (size_t i = 0; i < frames; i++)
d[i] = s[i * 2]; // 只取左声道
}
// ================== NiuTrans 翻译 ==================
static String sha256Base64(const String &data)
{
uint8_t hash[32];
mbedtls_md_context_t ctx;
mbedtls_md_init(&ctx);
mbedtls_md_setup(&ctx, mbedtls_md_info_from_type(MBEDTLS_MD_SHA256), 0);
mbedtls_md_starts(&ctx);
mbedtls_md_update(&ctx, (const unsigned char *)data.c_str(), data.length());
mbedtls_md_finish(&ctx, hash);
mbedtls_md_free(&ctx);
char b64[64];
size_t outLen = 0;
mbedtls_base64_encode((unsigned char *)b64, sizeof(b64), &outLen, hash, 32);
b64[outLen] = 0;
return String("SHA-256=") + String(b64);
}
static String translateText(const String &text)
{
if (text.length() == 0)
return "(empty)";
static char textB64[8192];
size_t tLen = 0;
mbedtls_base64_encode((unsigned char *)textB64, sizeof(textB64), &tLen,
(const unsigned char *)text.c_str(), text.length());
textB64[tLen] = 0;
static char reqBody[8700];
int bodyLen = snprintf(reqBody, sizeof(reqBody),
"{\"common\":{\"app_id\":\"%s\"},"
"\"business\":{\"from\":\"%s\",\"to\":\"%s\"},"
"\"data\":{\"text\":\"%s\"}}",
XFYUN_APPID, NT_FROM, NT_TO, textB64);
String date = rfc1123_gmt(time(nullptr));
String digest = sha256Base64(String(reqBody));
String sigOrigin = String("host: ") + NT_HOST + "\n" +
"date: " + date + "\n" +
"POST " + String(NT_PATH) + " HTTP/1.1\n" +
"digest: " + digest;
String sigB64 = hmacSha256Base64(String(XFYUN_APISECRET), sigOrigin);
String authHeader = String("api_key=\"") + XFYUN_APIKEY +
"\", algorithm=\"hmac-sha256\", headers=\"host date request-line digest\", signature=\"" +
sigB64 + "\"";
WiFiClientSecure client;
client.setInsecure();
if (!client.connect(NT_HOST, 443, 10000))
{
client.stop();
return "(connect failed)";
}
client.printf(
"POST %s HTTP/1.1\r\n"
"Host: %s\r\n"
"Date: %s\r\n"
"Digest: %s\r\n"
"Authorization: %s\r\n"
"Content-Type: application/json\r\n"
"Content-Length: %d\r\n"
"\r\n"
"%s",
NT_PATH, NT_HOST, date.c_str(), digest.c_str(),
authHeader.c_str(), bodyLen, reqBody);
client.setTimeout(3000);
uint32_t t0 = millis();
while (client.connected() && !client.available() && millis() - t0 < 3000)
{
delay(5);
}
int contentLen = 0;
while (client.connected() && millis() - t0 < 5000)
{
if (!client.available())
{
delay(5);
continue;
}
String line = client.readStringUntil('\n');
if (line.startsWith("Content-Length:"))
{
contentLen = line.substring(15).toInt();
}
if (line.length() <= 2)
break; // \r\n = end of headers
t0 = millis();
}
String respBody;
if (contentLen > 0 && contentLen < 65536)
{
char *buf = (char *)malloc(contentLen + 1);
if (buf)
{
client.setTimeout(3000);
size_t read = client.readBytes((uint8_t *)buf, contentLen);
buf[read] = 0;
respBody = String(buf);
free(buf);
}
}
client.stop();
JsonDocument doc;
if (deserializeJson(doc, respBody))
return "(json error)";
int code = doc["code"] | -1;
if (code != 0)
return "(api error)";
const char *dst = doc["data"]["result"]["trans_result"]["dst"] | "";
return String(dst);
}
// ================== TTS (长文本语音合成) ==================
static String ttsHttpAuthUrl(const char *path)
{
String date = rfc1123_gmt(time(nullptr));
String sigOrigin = String("host: ") + TTS_HOST + "\n" +
"date: " + date + "\n" +
"POST " + String(path) + " HTTP/1.1";
String sigB64 = hmacSha256Base64(String(XFYUN_APISECRET), sigOrigin);
String authOrigin = String("api_key=\"") + XFYUN_APIKEY +
"\", algorithm=\"hmac-sha256\", headers=\"host date request-line\", signature=\"" +
sigB64 + "\"";
char authB64[512];
size_t aLen = 0;
mbedtls_base64_encode((unsigned char *)authB64, sizeof(authB64), &aLen,
(const unsigned char *)authOrigin.c_str(), authOrigin.length());
authB64[aLen] = 0;
return String("http://") + TTS_HOST + path +
"?host=" + urlEncode(TTS_HOST) +
"&date=" + urlEncode(date) +
"&authorization=" + urlEncode(String(authB64));
}
static String ttsHttpPost(const char *path, const String &body)
{
WiFiClientSecure client;
client.setInsecure();
if (!client.connect(TTS_HOST, 443, 5000))
{
client.stop();
return "";
}
String authUrl = ttsHttpAuthUrl(path);
int ps = authUrl.indexOf(path);
String qs = authUrl.substring(authUrl.indexOf("?", ps));
client.printf(
"POST %s%s HTTP/1.1\r\nHost: %s\r\nContent-Type: application/json\r\nContent-Length: %d\r\n\r\n%s",
path, qs.c_str(), TTS_HOST, body.length(), body.c_str());
client.setTimeout(3000);
uint32_t t0 = millis();
while (client.connected() && !client.available() && millis() - t0 < 5000)
delay(5);
int contentLen = 0;
while (client.connected() && millis() - t0 < 5000)
{
if (!client.available())
{
delay(5);
continue;
}
String line = client.readStringUntil('\n');
if (line.startsWith("Content-Length:"))
contentLen = line.substring(15).toInt();
if (line.length() <= 2)
break;
t0 = millis();
}
String respBody;
if (contentLen > 0 && contentLen < 65536)
{
char *buf = (char *)malloc(contentLen + 1);
if (buf)
{
client.setTimeout(5000);
size_t r = client.readBytes((uint8_t *)buf, contentLen);
buf[r] = 0;
respBody = String(buf);
free(buf);
}
}
client.stop();
return respBody;
}
static bool ttsSpeak(const String &text)
{
if (text.length() == 0)
return false;
static char textB64[8192];
size_t tLen = 0;
mbedtls_base64_encode((unsigned char *)textB64, sizeof(textB64), &tLen,
(const unsigned char *)text.c_str(), text.length());
textB64[tLen] = 0;
static char createBody[8700];
snprintf(createBody, sizeof(createBody),
"{\"header\":{\"app_id\":\"%s\"},"
"\"parameter\":{\"dts\":{"
"\"vcn\":\"%s\",\"language\":\"zh\",\"speed\":50,\"volume\":80,\"pitch\":50,"
"\"audio\":{\"encoding\":\"raw\",\"sample_rate\":16000,\"channels\":1,\"bit_depth\":16},"
"\"pybuf\":{\"encoding\":\"utf8\",\"compress\":\"raw\",\"format\":\"plain\"}"
"}},"
"\"payload\":{\"text\":{\"encoding\":\"utf8\",\"compress\":\"raw\",\"format\":\"plain\",\"text\":\"%s\"}}}",
XFYUN_APPID, TTS_VCN, textB64);
String resp = ttsHttpPost(TTS_CREATE, String(createBody));
JsonDocument doc;
if (deserializeJson(doc, resp) || (doc["header"]["code"] | -1) != 0)
return false;
const char *taskId = doc["header"]["task_id"] | "";
for (int i = 0; i < 120; i++)
{
delay(100);
static char qb[256];
snprintf(qb, sizeof(qb), "{\"header\":{\"app_id\":\"%s\",\"task_id\":\"%s\"}}", XFYUN_APPID, taskId);
String qr = ttsHttpPost(TTS_QUERY, String(qb));
JsonDocument qd;
if (deserializeJson(qd, qr))
continue;
const char *ts = qd["header"]["task_status"] | "";
if (strcmp(ts, "5") == 0)
{
const char *ab = qd["payload"]["audio"]["audio"] | "";
if (!ab || !strlen(ab))
return false;
size_t dl = (strlen(ab) * 3) / 4 + 8;
char *url = (char *)malloc(dl);
if (!url)
return false;
size_t ol = 0;
mbedtls_base64_decode((unsigned char *)url, dl, &ol, (const unsigned char *)ab, strlen(ab));
url[ol] = 0;
String u = String(url);
free(url);
int s3 = u.indexOf("://") + 3;
int p1 = u.indexOf("/", s3);
String h = u.substring(s3, p1);
String p = u.substring(p1);
WiFiClientSecure dc;
dc.setInsecure();
if (!dc.connect(h.c_str(), 443, 10000))
return false;
dc.printf("GET %s HTTP/1.1\r\nHost: %s\r\nConnection: close\r\n\r\n", p.c_str(), h.c_str());
dc.setTimeout(3000);
uint32_t td = millis();
while (dc.connected() && !dc.available() && millis() - td < 5000)
delay(5);
int cl = 0;
while (dc.connected() && millis() - td < 5000)
{
if (!dc.available())
{
delay(5);
continue;
}
String l = dc.readStringUntil('\n');
if (l.startsWith("Content-Length:"))
cl = l.substring(15).toInt();
if (l.length() <= 2)
break;
td = millis();
}
uint8_t *pcm = NULL;
if (cl > 0 && cl < 524288)
{
pcm = (uint8_t *)malloc(cl);
if (pcm)
{
size_t total = 0;
dc.setTimeout(5000);
uint32_t tStart = millis();
while (total < (size_t)cl && millis() - tStart < 20000)
{
if (dc.available())
{
size_t chunk = dc.available();
if (chunk > (size_t)(cl - total))
chunk = cl - total;
size_t r = dc.readBytes(pcm + total, chunk);
if (r == 0)
break;
total += r;
}
else if (dc.connected())
{
delay(10);
}
else
{
break;
}
}
dc.stop();
if (total < 100)
{
free(pcm);
return false;
}
size_t samples = total / 2;
size_t stereoBytes = samples * 4;
uint8_t *stereo = (uint8_t *)malloc(stereoBytes);
if (!stereo)
{
free(pcm);
return false;
}
int16_t *src = (int16_t *)pcm;
int16_t *dst = (int16_t *)stereo;
for (size_t i = 0; i < samples; i++)
{
dst[i * 2] = src[i];
dst[i * 2 + 1] = src[i];
}
free(pcm);
echobase.setSpeakerVolume(80);
echobase.setMute(false);
delay(50);
echobase.play(stereo, stereoBytes);
delay(100);
echobase.setMute(true);
free(stereo);
return true;
}
}
dc.stop();
if (pcm)
free(pcm);
return false;
}
if (strcmp(ts, "2") == 0 || strcmp(ts, "4") == 0)
return false;
}
return false;
}
// ================== 鉴权URL构建 ==================
static String buildAuthUrl()
{
time_t now = time(nullptr);
String date = rfc1123_gmt(now);
String sigOrigin = String("host: ") + IAT_HOST + "\n" +
"date: " + date + "\n" +
"GET " + String(IAT_PATH) + " HTTP/1.1";
String sigB64 = hmacSha256Base64(String(XFYUN_APISECRET), sigOrigin);
String authOrigin = String("api_key=\"") + XFYUN_APIKEY +
"\", algorithm=\"hmac-sha256\", headers=\"host date request-line\", signature=\"" +
sigB64 + "\"";
char authB64[512];
size_t aLen = 0;
mbedtls_base64_encode((unsigned char *)authB64, sizeof(authB64), &aLen,
(const unsigned char *)authOrigin.c_str(), authOrigin.length());
authB64[aLen] = 0;
String url = String("wss://") + IAT_HOST + IAT_PATH +
"?authorization=" + urlEncode(String(authB64)) +
"&date=" + urlEncode(date) +
"&host=" + urlEncode(IAT_HOST);
return url;
}
// ================== 结果解析 ==================
static String extractWsCwText(JsonVariant wsArr)
{
String out;
for (JsonVariant ws1 : wsArr.as<JsonArray>())
{
for (JsonVariant cw : ws1["cw"].as<JsonArray>())
{
out += (const char *)(cw["w"] | "");
}
}
return out;
}
static void parseResponse(const String &msg)
{
JsonDocument doc;
if (deserializeJson(doc, msg))
{
DPRINTLN("[RAW]");
DPRINTLN(msg);
return;
}
int hcode = doc["header"]["code"] | -1;
const char *hmsg = doc["header"]["message"] | "";
int hstatus = doc["header"]["status"] | -1;
if (hcode != 0)
{
Serial.printf("[语音识别] 错误码=%d 信息=%s\n", hcode, hmsg);
DPRINTLN(msg);
sessionDone = true;
return;
}
const char *b64Text = doc["payload"]["result"]["text"] | "";
int pstatus = doc["payload"]["result"]["status"] | -1;
if (b64Text && strlen(b64Text))
{
String innerStr;
if (!base64DecodeToString(b64Text, innerStr))
{
Serial.println("[语音识别] base64 解码失败");
DPRINTLN(msg);
sessionDone = true;
return;
}
JsonDocument inner;
if (deserializeJson(inner, innerStr))
{
Serial.println("[语音识别] 内部 JSON 解析失败");
DPRINTLN(innerStr);
sessionDone = true;
return;
}
int ret = inner["ret"] | 0;
if (ret != 0)
{
Serial.printf("[语音识别] 内部错误=%d\n", ret);
DPRINTLN(innerStr);
sessionDone = true;
return;
}
JsonVariant wsArr = inner["ws"];
if (!wsArr.isNull())
{
String piece = extractWsCwText(wsArr);
if (piece.length())
{
finalText += piece;
}
}
else
{
DPRINTLN("[INNER JSON]");
DPRINTLN(innerStr);
}
}
if (hstatus == 2 || pstatus == 2)
{
sessionDone = true;
}
}
void webSocketEvent(WStype_t type, uint8_t *payload, size_t length)
{
switch (type)
{
case WStype_CONNECTED:
wsConnected = true;
break;
case WStype_DISCONNECTED:
wsConnected = false;
sessionDone = true; // 录音中断线,立即终止当前会话
if (DEBUG_LOG && payload && length)
{
Serial.printf("WebSocket 断开原因: %.*s\n", (int)length, (const char *)payload);
}
break;
case WStype_TEXT:
{
String msg((const char *)payload, length);
parseResponse(msg);
break;
}
default:
break;
}
}
void setup()
{
auto cfg = M5.config();
cfg.serial_baudrate = 115200;
M5.begin(cfg);
delay(100);
WiFi.mode(WIFI_STA);
WiFi.begin(WIFI_SSID, WIFI_PASS);
while (WiFi.status() != WL_CONNECTED)
delay(200);
Serial.printf("WiFi 已连接: %s\n", WiFi.localIP().toString().c_str());
syncTimeNtpOrDie();
Serial.print("初始化 EchoBase... ");
if (!echobase.init(SAMPLE_RATE, 38, 39, 7, 6, 5, 8, Wire))
{
Serial.println("失败");
while (true)
delay(1000);
}
Serial.println("成功");
echobase.setMute(true);
Serial.println("按任意键开始录音 录音中按任意键停止");
}
// 串口按键触发
static bool waitStartCommand()
{
if (!Serial.available())
return false;
while (Serial.available())
Serial.read(); // 清空缓冲:任意字符触发一次
return true;
}
void loop()
{
ws.loop();
if (!waitStartCommand())
{
delay(5);
return;
}
ws.disconnect();
for (int i = 0; i < 20; i++)
{
ws.loop();
delay(10);
}
wsConnected = false;
sessionDone = false;
finalText = "";
String url = buildAuthUrl();
int idx = url.indexOf(IAT_PATH);
String pathAndQuery = url.substring(idx); // "/v1?authorization=...&date=...&host=..."
DPRINTF("WS pathAndQuery len=%u\n", (unsigned)pathAndQuery.length());
// connect
ws.setReconnectInterval(500);
ws.beginSslWithCA(IAT_HOST, IAT_PORT, pathAndQuery.c_str(), NULL);
ws.onEvent(webSocketEvent);
uint32_t t0 = millis();
while (!wsConnected && millis() - t0 < 15000)
{
ws.loop();
delay(10);
}
if (!wsConnected)
{
Serial.println("WebSocket 连接超时");
ws.disconnect();
return;
}
ws.setReconnectInterval(600000);
static uint8_t *stereoBuf = NULL;
static uint8_t *monoBuf = NULL;
if (!stereoBuf)
stereoBuf = (uint8_t *)heap_caps_malloc(STEREO_BYTES, MALLOC_CAP_SPIRAM);
if (!monoBuf)
monoBuf = (uint8_t *)heap_caps_malloc(MONO_BYTES, MALLOC_CAP_SPIRAM);
if (!stereoBuf || !monoBuf)
{
Serial.println("内存分配失败");
ws.disconnect();
return;
}
static char b64buf[B64_CAP];
Serial.println("录音中... (按任意键停止)");
bool first = true;
int seq = 1;
uint32_t sentMs = 0;
uint32_t nextTick = millis();
while (!sessionDone && sentMs < MAX_MS)
{
while ((int32_t)(millis() - nextTick) < 0)
{
ws.loop();
delay(1);
}
nextTick += CHUNK_MS;
ws.loop();
if (!echobase.record(stereoBuf, (int)STEREO_BYTES))
break;
stereo_to_mono_left(stereoBuf, monoBuf, STEREO_BYTES);
if (!base64EncodeFixed(monoBuf, MONO_BYTES, b64buf, sizeof(b64buf)))
{
Serial.println("[语音识别] base64 编码失败");
break;
}
// ===== JSON 到固定缓冲 =====
int n = 0;
if (first)
{
n = snprintf(sendBuf, sizeof(sendBuf),
"{\"header\":{\"app_id\":\"%s\",\"status\":0},"
"\"parameter\":{\"iat\":{"
"\"domain\":\"%s\",\"language\":\"%s\",\"accent\":\"%s\",\"eos\":%d,"
"\"result\":{\"encoding\":\"%s\",\"compress\":\"%s\",\"format\":\"%s\"}"
"}},"
"\"payload\":{\"audio\":{"
"\"encoding\":\"raw\",\"sample_rate\":%d,\"channels\":1,\"bit_depth\":16,"
"\"seq\":%d,\"status\":0,\"audio\":\"%s\""
"}}}",
XFYUN_APPID,
IAT_DOMAIN, IAT_LANGUAGE, IAT_ACCENT, IAT_EOS_MS,
RESULT_ENCODING, RESULT_COMPRESS, RESULT_FORMAT,
SAMPLE_RATE,
seq++, b64buf);
first = false;
}
else
{
n = snprintf(sendBuf, sizeof(sendBuf),
"{\"header\":{\"app_id\":\"%s\",\"status\":1},"
"\"payload\":{\"audio\":{"
"\"encoding\":\"raw\",\"sample_rate\":%d,\"channels\":1,\"bit_depth\":16,"
"\"seq\":%d,\"status\":1,\"audio\":\"%s\""
"}}}",
XFYUN_APPID,
SAMPLE_RATE,
seq++, b64buf);
}
if (n <= 0 || (size_t)n >= sizeof(sendBuf))
{
Serial.println("[语音识别] 发送缓冲区溢出");
break;
}
ws.sendTXT(sendBuf, (size_t)n);
sentMs += CHUNK_MS;
if (Serial.available())
{
while (Serial.available())
Serial.read();
break;
}
}
{
int n = snprintf(sendBuf, sizeof(sendBuf),
"{\"header\":{\"app_id\":\"%s\",\"status\":2},"
"\"payload\":{\"audio\":{"
"\"encoding\":\"raw\",\"sample_rate\":%d,\"channels\":1,\"bit_depth\":16,"
"\"seq\":%d,\"status\":2,\"audio\":\"\""
"}}}",
XFYUN_APPID, SAMPLE_RATE, seq++);
if (n > 0 && (size_t)n < sizeof(sendBuf))
{
ws.sendTXT(sendBuf, (size_t)n);
}
}
uint32_t t1 = millis();
while (!sessionDone && millis() - t1 < 10000)
{
ws.loop();
delay(10);
}
ws.disconnect();
delay(100);
Serial.println("\n==== 识别内容 ====");
Serial.println(finalText.length() ? finalText : "(空)");
Serial.println("====================");
// NiuTrans 翻译
if (finalText.length())
{
String translated = translateText(finalText);
Serial.println("\n==== 译文内容 ====");
Serial.println(translated);
Serial.println("====================");
Serial.println();
// TTS 语音播报
if (!ttsSpeak(translated))
Serial.println("[语音合成] 失败");
}
Serial.println("按任意键开始录音 录音中按任意键停止");
}四、程序烧录
1、连接USB数据线至开发板;
2、选择端口号对应的开发板;
3、点击 上传 烧录程序到开发板上;

五、效果演示
自动识别不同语种的语言,将其翻译至目标文本 / 音频输出;
(主要展示单独5种不同语种翻译,以及不同语种混合翻译的效果)
六、总结
在本次开发中,云端API方案有腾讯、阿里、百度等,最后选择了讯飞,相比于其他几家,讯飞主要平台业务方面都是集中在音视频转换方面,而且调用接口流程也相对比较方便;
整个流程中最主要是语音识别转文字部分,如果识别准确率不高,会直接影响后续的翻译、播报等效果;
在实际测试中发现影响语音识别效果的主要因素是源语种的发音相对标准程度【AI外国人(en)讲中文 / 咬字带口音】等都会很大程度的影响识别的准确度;
(不过基本上只要发音相对标准 基本都能被准确的识别)
如果开发板还自带额外按键 / 屏幕,就可以直接将相关内容显示到屏幕上,可能效果会更好;
我要赚赏金
