Mistral AI重磅推出Pixtral 12B开源多模态大模型!vLLM本地部署Pixtral轻松实现视频智能分析,打造你的AI视觉助手-从图像识别到视频分析!#mistral #pixtral
🔥🔥🔥本篇笔记所对应的视频 https://youtu.be/nu4qK9xVD3A
pixtral 12B轻松实现视频智能分析
Pixtral 12B是由法国初创公司Mistral AI最新推出的多模态人工智能模型。这个模型意义重大,因为它代表了Mistral首次尝试结合文本和图像处理能力,使其有望与OpenAI和Anthropic等公司的领先人工智能模型展开竞争。
Pixtral 12B的主要特点
模型架构和参数
- 基础模型:基于之前发布的文本模型Nemo 12B。
- 总参数量:跨40层共120亿参数。
- 视觉适配器:集成了4亿参数的视觉适配器,增强了处理视觉数据的能力。
- 隐藏维度:具有14,336个隐藏维度和32个注意力头,实现了广泛的计算处理。
- 图像分辨率:能够处理1024 x 1024像素分辨率的图像,将其分割成16 x 16像素的小块。
- 词元词汇量:扩展了词汇量至131,072个词元,包括三个新的专用于图像处理的特殊词元:
img
、img_break
和img_end
。 - 位置编码:采用2D RoPE(旋转位置嵌入)来增强对图像空间关系的理解。
功能
Pixtral 12B允许用户通过URL或base64编码输入图像,能够执行多种任务,如:
- 图像描述:为上传的图像生成描述性文字。
- 物体识别:识别并计数图像中的物体。
- 视觉问答:根据图像内容回答问题。
该模型的架构支持同时处理文本和图像,使其在内容分析和数据解释方面的应用具有多样性。
可访问性和许可
Pixtral 12B可通过GitHub和Hugging Face等多个平台下载,也可以通过磁力链接获取。它以Apache 2.0许可证发布,允许用户自由使用、修改和商业化该模型。然而,用于训练模型的具体数据集尚未公开,这引发了对训练数据可能涉及的版权问题的质疑。此外,其完整的许可条款尚未完全明确,预计未来将有更多相关信息发布。
未来发展
Mistral计划将Pixtral 12B整合到其聊天机器人Le Chat和API平台La Platforme中,使开发者和用户更容易使用。随着人工智能社区开始试验Pixtral 12B,预计将会对其能力和性能有更深入的了解。
Pixtral 12B标志着Mistral AI在多模态人工智能领域迈出了重要一步,有望增强生成式人工智能应用的能力。这个模型通过结合先进的文本处理能力和新增的视觉处理功能,为用户提供了一个强大而灵活的多模态AI工具。
环境
- Ubuntu
- Nvidia RTX A6000*2
- vLLM
👉👉👉如有问题请联系我的徽信 stoeng
🔥🔥🔥本项目代码由AI超元域频道制作,观看更多大模型微调视频请访问我的频道⬇
👉👉👉我的哔哩哔哩频道
👉👉👉我的YouTube频道
👉👉👉我的开源项目 https://github.com/win4r/AISuperDomain
安装
pip install --upgrade mistral_common vllm
vllm serve mistralai/Pixtral-12B-2409 --tokenizer_mode mistral --limit_mm_per_prompt 'image=4' --max_num_batched_tokens 16384 --gpu-memory-utilization 0.95 --max-model-len 65536vllm serve mistralai/Pixtral-12B-2409 --tokenizer_mode mistral --limit_mm_per_prompt 'image=4' --max_num_batched_tokens 65536 --gpu-memory-utilization 0.95 --max-model-len 65536pip install -U "huggingface_hub[cli]"huggingface-cli logincurl http://64.247.196.11:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mistralai/Pixtral-12B-2409",
"messages": [{"role": "user", "content": "Hello, how are you?"}]
}'
curl --location 'http://64.247.196.11:8000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer token' \
--data '{
"model": "mistralai/Pixtral-12B-2409",
"messages": [
{
"role": "user",
"content": [
{"type" : "text", "text": "Describe this image in detail please in Chinese."},
{"type": "image_url", "image_url": {"url": "https://s3.amazonaws.com/cms.ipressroom.com/338/files/201808/5b894ee1a138352221103195_A680%7Ejogging-edit/A680%7Ejogging-edit_hero.jpg"}},
{"type" : "text", "text": "and this one as well. Answer in Chinese."},
{"type": "image_url", "image_url": {"url": "https://www.wolframcloud.com/obj/resourcesystem/images/a0e/a0ee3983-46c6-4c92-b85d-059044639928/6af8cfb971db031b.png"}}
]
}
]
}'from openai import OpenAI# 正确初始化 OpenAI 客户端
client = OpenAI(
base_url="http://64.247.196.11:8000/v1",
api_key="test"
)response = client.chat.completions.create(
model="mistralai/Pixtral-12B-2409",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {
"url": "https://s3.amazonaws.com/cms.ipressroom.com/338/files/201808/5b894ee1a138352221103195_A680%7Ejogging-edit/A680%7Ejogging-edit_hero.jpg",
},
},
],
}
],
max_tokens=1024,
)print(response.choices[0])
import base64
from openai import OpenAIdef encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')# 初始化 OpenAI 客户端
client = OpenAI(
base_url="http://64.247.196.11:8000/v1",
api_key="test"
)# 本地图片路径
image_path = "./dog.jpg"# 编码图片
base64_image = encode_image(image_path)response = client.chat.completions.create(
model="mistralai/Pixtral-12B-2409",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}",
},
},
],
}
],
max_tokens=1024,
)print(response.choices[0])
stream
from openai import OpenAI
import sys
# 初始化 OpenAI 客户端
client = OpenAI(
base_url="http://64.247.196.11:8000/v1",
api_key="test"
)# 创建流式 completion 请求
stream = client.chat.completions.create(
model="mistralai/Pixtral-12B-2409",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {
"url": "https://s3.amazonaws.com/cms.ipressroom.com/338/files/201808/5b894ee1a138352221103195_A680%7Ejogging-edit/A680%7Ejogging-edit_hero.jpg",
},
},
],
}
],
max_tokens=1024,
stream=True # 启用流式输出
)# 处理流式响应
for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end='', flush=True)print() # 最后打印一个换行
import base64
import sys
from openai import OpenAI
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
# 初始化 OpenAI 客户端
client = OpenAI(
base_url="http://64.247.196.11:8000/v1",
api_key="test"
)# 本地图片路径
image_path = "./dog.jpg"# 编码图片
base64_image = encode_image(image_path)# 创建流式 completion 请求
stream = client.chat.completions.create(
model="mistralai/Pixtral-12B-2409",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}",
},
},
],
}
],
max_tokens=1024,
stream=True # 启用流式输出
)# 处理流式响应
for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end='', flush=True)print() # 最后打印一个换行
UI Chatbot
#pip install streamlit
#streamlit run bot.py
import streamlit as st
import base64
from openai import OpenAI
from io import BytesIO
def encode_image(image_file):
return base64.b64encode(image_file.getvalue()).decode('utf-8')
def get_chat_response(client, messages):
stream = client.chat.completions.create(
model="mistralai/Pixtral-12B-2409",
messages=messages,
max_tokens=1024,
stream=True
)
return stream
st.title("多模态Chatbot")# 初始化OpenAI客户端
client = OpenAI(
base_url="http://64.247.196.11:8000/v1",
api_key="test"
)# 初始化会话状态
if "messages" not in st.session_state:
st.session_state.messages = []# 显示聊天历史
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.markdown(message["content"])# 图片上传
uploaded_file = st.file_uploader("上传图片", type=["jpg", "jpeg", "png"])# 用户输入
user_input = st.chat_input("输入你的问题")if uploaded_file and user_input:
# 编码图片
base64_image = encode_image(uploaded_file) # 创建新的用户消息
new_message = {
"role": "user",
"content": [
{"type": "text", "text": user_input},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}",
},
},
]
} st.session_state.messages.append({"role": "user", "content": user_input}) with st.chat_message("user"):
st.markdown(user_input)
st.image(uploaded_file, caption="上传的图片", use_column_width=True) # 获取AI响应
with st.chat_message("assistant"):
message_placeholder = st.empty()
full_response = ""
for chunk in get_chat_response(client, [new_message]):
if chunk.choices[0].delta.content is not None:
full_response += chunk.choices[0].delta.content
message_placeholder.markdown(full_response + "▌")
message_placeholder.markdown(full_response) st.session_state.messages.append({"role": "assistant", "content": full_response})st.sidebar.title("使用说明")
st.sidebar.markdown("""
1. 上传一张图片
2. 在输入框中输入你的问题
3. 等待AI分析图片并回答你的问题
""")
chainlit
import base64
import chainlit as cl
from openai import OpenAI
# 初始化 OpenAI 客户端
client = OpenAI(
base_url="http://64.247.196.48:8000/v1",
api_key="test"
)def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')@cl.on_chat_start
async def start():
await cl.Message("欢迎使用图像分析应用!请选择一张图片并输入您的问题,然后点击发送。").send()@cl.on_message
async def main(message: cl.Message):
if not message.elements:
await cl.Message("请上传一张图片并输入您的问题。").send()
return image = message.elements[0]
if not image.mime.startswith("image"):
await cl.Message("请上传一个有效的图片文件。").send()
return question = message.content
if not question:
await cl.Message("请输入关于图片的问题。").send()
return await process_image(image.path, question)async def process_image(image_path, question):
base64_image = encode_image(image_path) stream = client.chat.completions.create(
model="mistralai/Pixtral-12B-2409",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": question},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}",
},
},
],
}
],
max_tokens=1024,
stream=True
) msg = cl.Message(content="")
await msg.send() full_response = ""
for chunk in stream:
if chunk.choices[0].delta.content is not None:
content = chunk.choices[0].delta.content
full_response += content
await msg.stream_token(content) await msg.update()if __name__ == "__main__":
cl.run()