RAG系统搭建(基础版)

2025-02-20 Thursday

python

1. python环境搭建
1. 1.1. 安装
2. 1.2. 环境配置
2. 模型下载
3. 安装依赖库
4. 调用本地模型进行推理测试
1. 4.1. 执行代码
2. 4.2. 测试结果
5. 创建知识库(RAG)
6. 总结

简单的搭建一个大语言模型的RAG系统

RAG（Retrieval-Augmented Generation）系统是一种结合了信息检索（Retrieval）和文本生成（Generation）能力的框架，通过将传统的基于检索的方法与现代的生成模型相结合，来提升对任务的回答质量和准确性。

python环境搭建

安装

使用Anaconda来安装python环境，可以直接在系统的应用商店中安装。

环境配置

打开Anaconda PowerShell Prompt
1.创建环境

1	conda create --name envName python=3.10 // 目前最好使用3.10版本，对各种库的兼容性比较好

2.激活环境

1	conda activate envName

3.查看当前环境列表

1	conda env list

4.卸载环境

1 2	// 确保当前激活环境非要删除的envName conda env remove --name envName

5.安装依赖

// 安装单个包
pip install package_name

// 根据requirements.txt文件安装所有依赖
pip install -r requirements.txt

6.更换镜像

pip install package_name -i https://pypi.tuna.tsinghua.edu.cn/simple

清华：https://pypi.tuna.tsinghua.edu.cn/simple
华为云：https://repo.huaweicloud.com/repository/pypi/simple

7.查看当前环境都安装了哪些依赖库

1	conda list

模型下载

HuggingFace模型，这个网站需要科学上网，所以国内使用modelscope

安装modelscope依赖库

1	pip install modelscope -i https://repo.huaweicloud.com/repository/pypi/simple

Sentence Transformer模型下载

在进行RAG之前，需要使用词向量模型进行Embedding，将文本进行向量化处理，执行下述命令下载模型

# 模型下载
from modelscope import snapshot_download

# model_id 模型的id
# cache_dir 模型下载路径
model_dir = snapshot_download(model_id='Ceceliachenen/paraphrase-multilingual-MiniLM-L12-v2', cache_dir='D:/jinux/models')

LLM模型下载

这里选择千问0.5B大模型

1
2
3

from modelscope import snapshot_download

model_dir = snapshot_download(model_id='Qwen/Qwen2.5-0.5B-Instruct', cache_dir='D:/jinux/models')

安装依赖库

下边案例所需要的依赖库

pip install llama_index -i https://repo.huaweicloud.com/repository/pypi/simple
pip install llama_index.llms.huggingface -i https://repo.huaweicloud.com/repository/pypi/simple
pip install transformers -i https://repo.huaweicloud.com/repository/pypi/simple
pip install llama_index.embeddings.huggingface -i https://repo.huaweicloud.com/repository/pypi/simple

调用本地模型进行推理测试

执行代码

from llama_index.llms.huggingface import HuggingFaceLLM
from llama_index.core.llms import ChatMessage

# 使用HuggingFace 加载本地大模型
llm = HuggingFaceLLM(
  # 给定的是本地模型的全路径
  model_name=r"D:\jinux\models\Qwen\Qwen2___5-0___5B-Instruct", # 这里是下载的模型名字
  tokenizer_name=r"D:\jinux\models\Qwen\Qwen2___5-0___5B-Instruct", # 模型名字
  model_kwargs={"trust_remote_code":True},
  tokenizer_kwargs={"trust_remote_code":True}
)
rsp = llm.chat(messages=[ChatMessage(content="请简单的介绍一下jinux")])
print(rsp)

测试结果

# 1
Jenkins 是一个开源的自动化构建工具，它允许开发者......

# 2
 JLinux 是一种基于 Linux 的开源操作系统，由阿里云......

推理结果每次都是错的。

创建知识库(RAG)

执行代码

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.huggingface import HuggingFaceLLM

# 初始化一个HuggingFaceEmbedding对象，用于将文本转换为向量表示
# 指定了一个预训练的sentence-transformer模型的路径
embed_model = HuggingFaceEmbedding(
    model_name=r"D:\jinux\models\Ceceliachenen\paraphrase-multilingual-MiniLM-L12-v2"
)

# 将创建的嵌入模型赋值给全局设置的embed_model属性，
# 这样在后续的索引构建过程中就会使用这个模型。
Settings.embed_model = embed_model

# 推理模型（生成模型）
llm = HuggingFaceLLM(
  # 给定的是本地模型的全路径
  model_name=r"D:\jinux\models\Qwen\Qwen2___5-0___5B-Instruct", # 这里是下载的模型名字
  tokenizer_name=r"D:\jinux\models\Qwen\Qwen2___5-0___5B-Instruct", # 模型名字
  model_kwargs={"trust_remote_code":True},
  tokenizer_kwargs={"trust_remote_code":True}
)

# 设置全局的llm属性，这样在索引查询时会使用这个模型
Settings.llm = llm

# RAG 系统构建过程
# 从指定目录读取所有文档，并加载数据到内存中，required_exts 只加载指定扩展名的文档
documents = SimpleDirectoryReader("./documents", required_exts=[".txt"]).load_data()

# 创建一个VectorStoreIndex，并使用之前加载的文档来构建索引
# 此索引将文档转换为向量，并存储这些向量以便于快速检索
# 默认是存储在内存中的
index = VectorStoreIndex.from_documents(documents)

# 创建一个查询引擎，这个引擎可以接收查询并返回相关文档的响应
query_engine = index.as_query_engine()
response = query_engine.query("请简单的介绍一下jinux")

print(response)

测试结果

1
2

他是一个大龄前端程序员。
他的工作经历和职业背景主要集中在工作中遇到的年龄危机以及他正在学习AI方面知识上。由于他是一位大龄程序员，他在工作中可能会面临一些挑战，比如可能被裁员的风险。因此，他需要不断学习新的技能和知识来应对这些风险。

测试结果是正确的，因为提供的文档中有这些信息。

原理

上述代码原理见下图

总结

这个RAG系统非常简单，但是基本原理是这样的，通过向量化的方式，将文本转换为向量，然后通过向量检索的方式，找到最相似的文本，最后再交给LLM模型进行推理，得到最终的答案。
源码

docker基础教程