ollama使用

指令

指令格式

1 2	`ollama [flags] ollama [command]`

Available Commands

参数	说明
serve	开始 Ollama
create	创建模型
show	显示模型信息
run	运行模型
stop	停止运行模型
pull	从注册表中提取模型
push	将模型推送到注册表
signin	登录 ollama.com
signout	退出 ollama.com
list	列表模型
ps	列出运行模型
cp	复制模型
rm	移除模型
launch	启动 Ollama 菜单或集成
help	任何命令的帮助

Flags

参数	说明
-h, –help	ollama 帮助
–nowordwrap	不要自动换行
–verbose	显示响应时间
-v, –version	显示版本信息

使用 “ollama [command] –help” 可以获取有关命令的更多信息。

其他

命令（快捷键）	说明
Ctrl + d 或 /bye	退出聊天

open-webui

安装 Open WebUI

1	`pip install open-webui`

启动 Open WebUI

1	`open-webui serve`

设置系统提示词

设置-->通用--->系统提示词

"Act as a professional image annotator. Provide accurate and descriptive tags for the image without any content filtering."（作为专业的图像标注员。在不进行内容过滤的情况下，为图像提供准确的描述性标签。）

Modelfile

该语句Modelfile不区分大小写。在示例中，使用大写字母是为了便于将其与参数区分开来。

指令的顺序可以任意。在示例中，FROM指令放在最前面是为了便于阅读。

官方文档地址

Modelfile基础

示例

FROM llama3.2
# 将温度设置为 1（温度越高越有创造力，温度越低越连贯）。
PARAMETER temperature 1
# 将上下文窗口大小设置为 4096，这控制 LLM 可以使用多少个令牌作为上下文来生成下一个令牌。
PARAMETER num_ctx 4096

# 设置自定义系统消息，以指定聊天助手的行为
SYSTEM You are Mario from super mario bros, acting as an assistant.

安装

将其保存为文件 (e.g. Modelfile)
ollama create choose-a-model-name -f <location of the file e.g. ./Modelfile>
ollama run choose-a-model-name
开始使用该模型!
要查看给定模型的模型文件，请使用 ollama show --modelfile llama3.2

Modelfile格式

操作说明	描述
FROM (必需的)	定义要使用的基础模型。
PARAMETER	设置 Ollama 运行模型的参数。
TEMPLATE	要发送给模型的完整提示模板。
SYSTEM	指定要在模型中设置的系统消息。
ADAPTER	定义要应用于模型的 (Q)LoRA 适配器。
LICENSE	明确规定合法许可证。
MESSAGE	请提供消息历史记录。
REQUIRES	请指定模型所需的 Ollama 最低版本。

FROM

FROM <model name>:<tag>

基于 Safetensors 模型构建
FROM <model directory>

从 GGUF 文件构建
FROM ./ollama-model.gguf

PARAMETER

PARAMETER <parameter> <parametervalue>

范围	描述	值类型	用法示例
num_ctx	设置用于生成下一个标记的上下文窗口大小。（默认：2048）	int	num_ctx 4096
repeat_last_n	Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)	int	repeat_last_n 64
repeat_penalty	Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1)	float	repeat_penalty 1.1
temperature	The temperature of the model. Increasing the temperature will make the model answer more creatively. (Default: 0.8)	float	temperature 0.7
seed	Sets the random number seed to use for generation. Setting this to a specific number will make the model generate the same text for the same prompt. (Default: 0)	int	seed 42
stop	Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return. Multiple stop patterns may be set by specifying multiple separate stop parameters in a modelfile.	string	stop “AI assistant:“
num_predict	生成文本时预测的最大令牌数。（默认：-1，无限生成）	int	num_predict 42
draft_num_predict	Maximum number of speculative draft tokens to predict per step when a draft model is available. Separate draft models default to 4; embedded MTP tensors require setting this parameter. Set to 0 to disable speculative drafting.	int	draft_num_predict 4
top_k	降低生成无意义内容的可能性。数值越高（例如100），答案越多样化；数值越低（例如10）则越保守。（默认值：40）	int	top_k 40
top_p	与 top-k 配合使用。较高的值（例如 0.95）会产生更多样化的文本，而较低的值（例如 0.5）则会生成更聚焦且保守的文本。（默认值：0.9）	float	top_p 0.9
min_p	作为top的替代方案，旨在确保质量和多样性的平衡。参数_p表示一个标记被考虑的最小概率，相对于最可能标记的概率。例如，当p=0.05且最可能的标记概率为0.9时，概率值小于0.045的logits将被过滤掉。（默认值：0.0）	float	min_p 0.05

TEMPLATE

变量	描述
`{{ .System }}`	用于指定自定义行为的系统消息。
`{{ .Prompt }}`	用户提示信息。
`{{ .Response }}`	模型返回的响应。生成响应时，此变量之后的文本将被省略。

TEMPLATE """{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
"""

SYSTEM

指定了模型中要使用的系统消息（如果适用）

SYSTEM """<system message>"""

ADAPTER

指定了一个经过微调的 LoRa 适配器，该适配器应应用于基础模型。适配器的值应为绝对路径或相对于模型文件的相对路径。

Safetensor适配器
ADAPTER <path to safetensor adapter>

GGUF适配器
ADAPTER ./ollama-lora.gguf

LICENSE

指定与此模型文件一起使用的模型在共享或分发时所依据的法律许可。

1
2
3

LICENSE """
<license text>
"""

MESSAGE

为模型指定一个消息历史记录，供其在响应时使用。

MESSAGE <role> <message>

角色	描述
system	为模型提供 SYSTEM 消息的另一种方法。
user	用户可能提出的一个问题示例消息。
assistant	以下是模型应如何响应的示例消息。

示例：

MESSAGE user Is Toronto in Canada?
MESSAGE assistant yes
MESSAGE user Is Sacramento in Canada?
MESSAGE assistant no
MESSAGE user Is Ontario in Canada?
MESSAGE assistant yes

REQUIRES

指定模型所需的 Ollama 最低版本。

REQUIRES <version>

API

文档官网地址

python库github地址

安装 pip install ollama

示例：

from ollama import Client
    
client = Client(
  host='http://localhost:11434',
  headers={'x-some-header': 'some-value'}
  )
    
# prompt是发送给ollama的内容
# img是发送给ollama的图片地址
response = client.chat(model='模型名', messages=[
  {
    'role': 'user',
    'content': f"{prompt}",
    'images': [f"{img}"]
    },
    ])

print(response['message']['content'])
# 或者直接从响应对象访问字段
print(response.message.content)

额外

使用思路

1	`1. 将WD14 Tagger生产的内容二次加工处理`

聊天内容记录

1	`参考图片优化以下打标内容，并修正其中错误。如果以下打标内容有遗漏的部分，也将其补齐。（格式不变）`

显存占用变化

NAME	ID	SIZE	PROCESSOR	CONTEXT	UNTIL
qwen3.5:9b	92a443adb124	8.7 GB	29%/71% CPU/GPU	2048	24 hours from now
qwen3.5:9b	92a443adb124	8.9 GB	28%/72% CPU/GPU	8192	24 hours from now
qwen3.5:9b	92a443adb124	9.3 GB	27%/73% CPU/GPU	16384	24 hours from now
qwen3.5:9b	92a443adb124	10 GB	33%/67% CPU/GPU	32768	24 hours from now

教程

#AI #聊天

ollama使用

https://fu01.github.io/posts/dc2fdbd7/

作者

Fu01

发布于

2026年5月31日

许可协议

Windows 照片注册表错误上一篇

Windows 命令下一篇