本地部署开源超低延迟音频生成模型：hertz-dev！支持4090显卡，理论延迟80毫秒！真正端到端零延迟语音生成模型！实时语音交互，8.5亿参数全双工，像和真人对话一样流畅 #hertzdev

4 min readNov 5, 2024

🔥🔥🔥本篇笔记所对应的视频 https://youtu.be/_Vw1rJrByO8

🚀Hertz-Dev是由Standard Intelligence公司推出的一款开源全双工音频生成基础模型，具有85亿参数。 🚀该模型旨在提升实时对话AI的性能，特别是在音频交互方面，能够实现低至80毫秒的理论延迟和120毫秒的实际延迟，适合在单个NVIDIA RTX 4090显卡上运行。 🚀Standard Intelligence还计划将Hertz模型扩展到700亿参数，以进一步提升其在实时对话AI中的应用能力。

model https://huggingface.co/si-community/hertz-dev/tree/main

github https://github.com/Standard-Intelligence/hertz-dev

Notebook代码 https://github.com/Standard-Intelligence/hertz-dev/blob/main/inference.ipynb

sudo apt-get update
sudo apt-get install portaudio19-dev python3-pyaudio
pip install --upgrade sounddevice

# 创建并激活环境
conda create -n myenv python=3.10 -y
conda activate myenv# 安装依赖
conda install pytorch torchaudio -c pytorch -y
conda install numpy matplotlib ipython jupyter jupyterlab -yconda install websockets -ypip install einops tqdm soundfile requests sounddevice fastapi uvicorn typing_extensions websocket# 克隆代码
git clone <https://github.com/Standard-Intelligence/hertz-dev.git>
cd hertz-devpip install -r requirements.txt# 在启动 JupyterLab 之前设置密码
jupyter server password# 然后再启动
jupyter lab --ip=0.0.0.0 --port=8888 --allow-root --no-browsernohup jupyter lab --ip=0.0.0.0 --port=8888 --allow-root --no-browser > jupyter.log 2>&1 &python inference_server.pypython inference_client.py

👉👉👉如有问题或请联系我的徽信 stoeng

🔥🔥🔥本项目代码由AI超元域频道制作，观看更多大模型微调视频请访问我的频道⬇

👉👉👉我的哔哩哔哩频道

👉👉👉我的YouTube频道

👉👉👉我的开源项目 https://github.com/win4r/AISuperDomain

客户端配置

conda create -n audio-env python=3.11
conda activate audio-env

# 安装基础依赖
conda install numpy websockets requests -y# 安装音频处理相关依赖
conda install portaudio -y
pip install sounddevice soundfile websocket-client# 安装其他依赖
pip install asyncio base64# 运行
python client.py --server ws://localhost:8000 --token_temp 0.8 --categorical_temp 0.5 --gaussian_temp 0.1

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Hertz Dev

Ai Voice Cloning

Ai Voice Assistant

Written by AI超元域

74 Followers

36 Following

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

More from AI超元域

本地部署最强OCR大模型olmOCR！支持结构化精准提取复杂PDF文件内容！完美识别中英文文档、模糊扫描件与复杂表格！本地部署与实际测试全过程！医疗法律行业必备！轻松应对企业级PDF批量转换需求

AI超元域

本地部署最强OCR大模型olmOCR！支持结构化精准提取复杂PDF文件内容！完美识别中英文文档、模糊扫描件与复杂表格！本地部署与实际测试全过程！医疗法律行业必备！轻松应对企业级PDF批量转换需求

allenai/olmocr是由Allen人工智能研究所(AI2)开发的一个开源工具包,旨在高效地将PDF和其他文档转换为结构化的纯文本,同时保持自然阅读顺序。以下是该项目的主要特点和功能:

Mar 1

🚀用MCP为AutoGen开挂接入各种工具和框架！Cline零代码开发MCP Server实现接入LangFlow进行文档问答！利用MCP…

AI超元域

🚀用MCP为AutoGen开挂接入各种工具和框架！Cline零代码开发MCP Server实现接入LangFlow进行文档问答！利用MCP…

🚀AutoGen v0.4引入了对Model Context Protocol (MCP) server的支持，这是一项重要的新功能，为AI代理提供了更强大和灵活的工具使用能力。

Feb 22

🚀微调DeepSeek-R1-Distill-Llama-8B打造SQL语言转自然语言大模型！小白也能十分钟打造自己的推理大模型！unsloth+Colab轻松上手

AI超元域

🚀微调DeepSeek-R1-Distill-Llama-8B打造SQL语言转自然语言大模型！小白也能十分钟打造自己的推理大模型！unsloth+Colab轻松上手

🚀简介：

Feb 10

取代ChatGPT Operator！支持DeepSeek+Web UI！Browser Use最强AI驱动的浏览器自动化框架，支持Roo Code轻松实现MCP…

AI超元域

取代ChatGPT Operator！支持DeepSeek+Web UI！Browser Use最强AI驱动的浏览器自动化框架，支持Roo Code轻松实现MCP…

Browser-use是一款开源的基于AI的智能浏览器自动化工具，而且这款开源项目分为命令行版本和web UI版本，并且支持deepseek、gpt-4o在内的开源和闭源模型。…

Feb 5

See all from AI超元域

Recommended from Medium

Predict

Will Lockett

This Is How Tesla Will Die

The vultures are circling the tech giant.

5d ago

137

Mohit Vaswani

6 AI Agents That Are So Good, They Feel Illegal

AI agents are the future because they can replace all the manual work with automation with 100% accuracy and fast speed.

Jan 11

227

Lists

Staff picks

827 stories1648 saves

Stories to Help You Level-Up at Work

19 stories948 saves

Self-Improvement 101

20 stories3355 saves

Productivity 101

20 stories2818 saves

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jessica Stillman

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jeff Bezos’s morning routine has long included the one-hour rule. New neuroscience says yours probably should too.

Oct 30, 2024

732

The 5 paid subscriptions I actually use in 2025 as a Staff Software Engineer

Level Up Coding

Jacob Bennett

The 5 paid subscriptions I actually use in 2025 as a Staff Software Engineer

Tools I use that are cheaper than Netflix

Jan 7

260

I Pretended to Be a Man on a Dating Site — And I Hate What I Discovered

Ginger

I Pretended to Be a Man on a Dating Site — And I Hate What I Discovered

As a 23-year-old woman fascinated by human behavior (and, let’s be honest, sometimes just bored and curious), I decided to conduct a…

Mar 2

366

How I Am Using a Lifetime 100% Free Server

Harendra

How I Am Using a Lifetime 100% Free Server

Get a server with 24 GB RAM + 4 CPU + 200 GB Storage + Always Free

Oct 26, 2024

170

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams

本地部署开源超低延迟音频生成模型：hertz-dev！支持4090显卡，理论延迟80毫秒！真正端到端零延迟语音生成模型！实时语音交互，8.5亿参数全双工，像和真人对话一样流畅 #hertzdev

model https://huggingface.co/si-community/hertz-dev/tree/main

github https://github.com/Standard-Intelligence/hertz-dev

Notebook代码 https://github.com/Standard-Intelligence/hertz-dev/blob/main/inference.ipynb

👉👉👉如有问题或请联系我的徽信 stoeng

🔥🔥🔥本项目代码由AI超元域频道制作，观看更多大模型微调视频请访问我的频道⬇

👉👉👉我的哔哩哔哩频道

👉👉👉我的YouTube频道

**👉👉👉我的开源项目 https://github.com/win4r/AISuperDomain**

客户端配置

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by AI超元域

No responses yet

More from AI超元域

本地部署最强OCR大模型olmOCR！支持结构化精准提取复杂PDF文件内容！完美识别中英文文档、模糊扫描件与复杂表格！本地部署与实际测试全过程！医疗法律行业必备！轻松应对企业级PDF批量转换需求

allenai/olmocr是由Allen人工智能研究所(AI2)开发的一个开源工具包,旨在高效地将PDF和其他文档转换为结构化的纯文本,同时保持自然阅读顺序。以下是该项目的主要特点和功能:

🚀用MCP为AutoGen开挂接入各种工具和框架！Cline零代码开发MCP Server实现接入LangFlow进行文档问答！利用MCP…

🚀AutoGen v0.4引入了对Model Context Protocol (MCP) server的支持，这是一项重要的新功能，为AI代理提供了更强大和灵活的工具使用能力。

🚀微调DeepSeek-R1-Distill-Llama-8B打造SQL语言转自然语言大模型！小白也能十分钟打造自己的推理大模型！unsloth+Colab轻松上手

🚀简介：

取代ChatGPT Operator！支持DeepSeek+Web UI！Browser Use最强AI驱动的浏览器自动化框架，支持Roo Code轻松实现MCP…

Browser-use是一款开源的基于AI的智能浏览器自动化工具， 而且这款开源项目分为命令行版本和web UI版本，并且支持deepseek、gpt-4o在内的开源和闭源模型。…

Recommended from Medium

This Is How Tesla Will Die

The vultures are circling the tech giant.

6 AI Agents That Are So Good, They Feel Illegal

AI agents are the future because they can replace all the manual work with automation with 100% accuracy and fast speed.

Lists

Staff picks

Stories to Help You Level-Up at Work

Self-Improvement 101

Productivity 101

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jeff Bezos’s morning routine has long included the one-hour rule. New neuroscience says yours probably should too.

The 5 paid subscriptions I actually use in 2025 as a Staff Software Engineer

Tools I use that are cheaper than Netflix

I Pretended to Be a Man on a Dating Site — And I Hate What I Discovered

As a 23-year-old woman fascinated by human behavior (and, let’s be honest, sometimes just bored and curious), I decided to conduct a…

How I Am Using a Lifetime 100% Free Server

Get a server with 24 GB RAM + 4 CPU + 200 GB Storage + Always Free

👉👉👉我的开源项目 https://github.com/win4r/AISuperDomain

Browser-use是一款开源的基于AI的智能浏览器自动化工具，而且这款开源项目分为命令行版本和web UI版本，并且支持deepseek、gpt-4o在内的开源和闭源模型。…