Langchain与Elasticsearch结合交互式分析数据库


简介

使用LLM将问题转换为 Elasticsearch 查询,对 Elasticsearch 数据库执行查询,并使用结果回答原始问题。

Langchain链: ElasticsearchDatabaseChain

该链通过 Elasticsearch DSL API(过滤器和聚合)构建搜索查询。 Elasticsearch 客户端必须具有索引列表、映射描述和搜索查询的权限。

如何使用ElasticsearchDatabaseChain

安装

pip install elasticsearch

构建测试数据

from elasticsearch import Elasticsearch
ELASTIC_SEARCH_SERVER = "https://elastic:pass@localhost:9200"
db = Elasticsearch(ELASTIC_SEARCH_SERVER)
 customers = [
     {"firstname": "Jennifer", "lastname": "Walters"},
     {"firstname": "Monica","lastname":"Rambeau"},
     {"firstname": "Carol","lastname":"Danvers"},
     {"firstname": "Wanda","lastname":"Maximoff"},
     {"firstname": "Jennifer","lastname":"Takeda"},
 ]
 for i, customer in enumerate(customers):
     db.create(index="customers", document=customer, id=i)

初始化

from langchain.chains.elasticsearch_database import ElasticsearchDatabaseChain
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(model_name="gpt-4", temperature=0)
chain = ElasticsearchDatabaseChain.from_llm(llm=llm, database=db, verbose=True)

提问查看效果

question = "What are the first names of all the customers?"
chain.run(question)

自定义提示词

为了获得最佳结果,可能需要自定义提示。

from langchain.chains.elasticsearch_database.prompts import DEFAULT_DSL_TEMPLATE
from langchain.prompts.prompt import PromptTemplate

PROMPT_TEMPLATE = """Given an input question, create a syntactically correct Elasticsearch query to run. Unless the user specifies in their question a specific number of examples they wish to obtain, always limit your query to at most {top_k} results. You can order the results by a relevant column to return the most interesting examples in the database.

Unless told to do not query for all the columns from a specific index, only ask for a the few relevant columns given the question.

Pay attention to use only the column names that you can see in the mapping description. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which index. Return the query as valid json.

Use the following format:

Question: Question here
ESQuery: Elasticsearch Query formatted as json
"""

PROMPT = PromptTemplate.from_template(
    PROMPT_TEMPLATE,
)
chain = ElasticsearchDatabaseChain.from_llm(llm=llm, database=db, query_prompt=PROMPT)