【STM32F769】搭建ollama转发服务器-电子产品世界论坛

【前言】

搭建好本地deepseek服务器后，可以使用http post到服务器，获取反馈数据。从服务器返回的是一块一块的数据，比如如下：

在STM32F769处理这些数据时，有时会由于返回数据太大，导致缓冲区溢出。所以需要在本地再搭建一个转发服务，我们只需要提出有用的信息转发给客户端即可。

【实现步聚】

1、首先引入需要的模块：

import http.server
import socketserver
import http.client
import json
import re

2、定义好host 端口为11434 模型名称

OLLAMA_HOST = 'localhost'
OLLAMA_PORT = 11434
MODEL_NAME = "deepseek-r1:1.5b"

3、编写转发服务

在POST路由中，首先获取客户端转发过来的post_data，然后再把model添加进post_data中，再组装进行转发给ollama服务器

 def do_POST(self):
        content_length = int(self.headers['Content-Length'])
        post_data = self.rfile.read(content_length)
        try:
            request_json = json.loads(post_data)
            request_json["model"] = MODEL_NAME
            modified_post_data = json.dumps(request_json).encode('utf-8')
 
            # 只保留必要的请求头
            necessary_headers = {
                'Content-Type': 'application/json',
                'Content-Length': str(len(modified_post_data))
            }
 
            conn = http.client.HTTPConnection(OLLAMA_HOST, OLLAMA_PORT)
            print(f"Forwarding request: {self.path}, {necessary_headers}, {modified_post_data}")
            conn.request('POST', self.path, modified_post_data, necessary_headers)
            response = conn.getresponse()

4、在拿到ollama返回的装诚码和头信息时，把他返回请求客户端：

            # 打印响应状态码和头信息
            print(f"Response status: {response.status}")
            print(f"Response headers: {response.getheaders()}")
 
            self.send_response(response.status)

5、由于服务器返回的是分块编码，我把他去掉掉：

            # 去掉分块传输编码头
            headers_to_send = [
                (header, value) for header, value in response.getheaders()
                if header.lower() != 'transfer-encoding'
            ]
            for header, value in headers_to_send:
                self.send_header(header, value)

6、接着把非需要的字符去掉，比如think标签，以及其他不需要的数据，只保留response的内容，并重新拼接：

         # 收集所有 response 字段的内容
            response_content = ""
            while True:
                line = response.readline()
                if not line:
                    break
                try:
                    json_obj = json.loads(line)
                    print(f"Parsed JSON: {json_obj}")
                    # 去除 <think> 和 </think> 标签
                    response_text = json_obj.get('response', "")
                    response_text = re.sub(r'<think>|</think>', '', response_text)
                    # 将 \n 或 \n\n 替换为 \r\n
                    response_text = re.sub(r'\n+', '\r\n', response_text)
                    response_content += response_text
                except json.JSONDecodeError:
                    print(f"Invalid JSON line: {line.decode('utf-8').strip()}")
                    continue  # 忽略非 JSON 行，继续处理后续数据

7、重新计算返回内容的长度，并发送回客户端：

         # 计算内容长度
            response_content_bytes = response_content.encode('utf-8')
            content_length = len(response_content_bytes)
            self.send_header('Content-Length', str(content_length))
            self.end_headers()
 
            # 一次性发送 response 内容给客户端
            self.wfile.write(response_content_bytes)
 
            conn.close()

8、最后定义端口为8000，开启服务器转发：

PORT = 8000
with socketserver.TCPServer(("", PORT), OllamaProxyHandler) as httpd:
    print(f"Serving at port {PORT}")
httpd.serve_forever()

【编写测试代码】

import requests
import json
LOCAL_SERVER_HOST = 'localhost'
LOCAL_SERVER_PORT = 8000
url = f'http://{LOCAL_SERVER_HOST}:{LOCAL_SERVER_PORT}/api/generate'
data = {
    "prompt": "你好"
}
json_data = json.dumps(data).encode('utf-8')
headers = {
    'Content-Type': 'application/json'
}
try:
    response = requests.post(url, data=json_data, headers=headers)
    if response.status_code == 200:
        print("Response from server:")
        print(response.text)
    else:
        print(f"Request failed with status code {response.status_code}: {response.text}")
except requests.RequestException as e:
print(f"An error occurred while making the request: {e}")

执行后，我们回收到的内容就简单许多了：

【总结】

在直接与ollama服务器进行请求时，会返回许多信息，我只关心需要的字段，经过如此处理后，在单片机中处理就简单许多了。

附代码：

post_test.zip

有奖活动
【EEPW电子工程师创研计划】技术变现通道已开启~
发原创文章【每月瓜分千元赏金凭实力攒钱买好礼~】
【EEPW在线】E起听工程师的声音！
“我踩过的那些坑”主题活动——第001期
高校联络员开始招募啦！有惊喜！！
【工程师专属福利】每天30秒，积分轻松拿！EEPW宠粉打卡计划启动！
送您一块开发板，2025年“我要开发板活动”又开始了！
打赏了！打赏了！打赏了！

打赏帖
电流检测模块MAX4080S被打赏10分
【我踩过的那些坑】calloc和malloc错误使用导致跑飞问题排查被打赏50分
多组DCTODC电源方案被打赏50分
【我踩过的那些坑】STM32cubeMX软件的使用过程中的“坑”被打赏50分
新手必看！C语言精华知识：表驱动法被打赏50分
【我踩过的那些坑】杜绑线问题被打赏50分
【我踩过的那些坑】STM32的硬件通讯调试过程的“坑”被打赏50分
【我踩过的那些坑】晶振使用的问题被打赏100分
【我踩过的那些坑】电感选型错误导致的处理器连接不上被打赏50分
【我踩过的那些坑】工作那些年踩过的记忆深刻的坑被打赏10分

热门分类
STM32	MCU
通讯及无线技术	物联网技术
电子DIY	板卡试用
基础知识	软件与操作系统
我爱生活	小e食堂

【STM32F769】搭建ollama转发服务器

回复