🦙LlamaIndex 써보기

LLM 활용을 위한 데이터 프레임워크다~

랭체인은 써 봤는데

박사님이 라마인덱스도 써보라고 하셨다

🦙TODO

- 박사님 깃허브 코드 보기
- 공식문서 보고 기본 세팅해보기
- 뭐든 한번 돌려보기

🦙박사님 깃허브 코드 보기

엥 분명 애플리케이션 레포가 있었던 것 같은데

없어진건가?

🦙 공식문서 보고 기본 세팅해보기 + 뭐든 한번 돌려보기

LlamaIndex - LlamaIndex

Welcome to LlamaIndex 🦙 ! LlamaIndex is a framework for building context-augmented LLM applications. Context augmentation refers to any use case that applies LLMs on top of your private or domain-specific data. Some popular use cases include the followi

docs.llamaindex.ai

LlamaIndex is a framework for building context-augmented LLM applications

🦙 LlamaIndex는 컨텍스트 증강 LLM 앱을 위한 데이터 프레임워크

활용법

Question-Answering Chatbots (commonly referred to as RAG systems, which stands for "Retrieval-Augmented Generation")
Document Understanding and Extraction
Autonomous Agents that can perform research and take actions

Context Augmentation 검색 증강

LLM(대형 언어 모델)은 인간과 데이터 사이의 자연어 인터페이스를 제공

널리 사용 가능한 모델은 공개적으로 이용할 수 있는 대량의 데이터로 사전 학습

=> 특정 문제를 해결하기 위해 필요한 비공개 데이터나 특정 데이터로 학습되지 않음

LlamaIndex는 컨텍스트 증강을 가능하게 하는 도구를 제공

- 추론 시 LLM과 컨텍스트를 결합하는 Retrieval-Augmented Generation (RAG)

- 파인 튜닝(finetuning)

LlamaIndex가 제공하는 도구

Data connectors : API, PDF, SQL 등 기존 데이터 소스를 원래 형식 그대로 수집
Data indexes : 데이터를 LLM이 쉽게 소비할 수 있도록 중간 표현으로 구조화
Engines : 데이터에 자연어로 접근할 수 있게 함

https://docs.llamaindex.ai/en/stable/getting_started/installation/

Installation and Setup - LlamaIndex

Installation and Setup The LlamaIndex ecosystem is structured using a collection of namespaced packages. What this means for users is that LlamaIndex comes with a core starter bundle, and additional integrations can be installed as needed. A complete list

docs.llamaindex.ai

다운로드!!

pip install llama-index

+ openai도

pip install openai

시작 튜토리얼이 openAI도 있고

로컬모델도 있다!

우선 위에 것을 해 보고

로컬 모델을 다운받은 후 두번째 것도 보자

🦙 코딩

https://docs.llamaindex.ai/en/stable/getting_started/starter_example/

Starter Tutorial (OpenAI) - LlamaIndex

Starter Tutorial (OpenAI) This is our famous "5 lines of code" starter example using OpenAI. Tip Make sure you've followed the installation steps first. Download data This example uses the text of Paul Graham's essay, "What I Worked On". This and many othe

docs.llamaindex.ai

디렉토리 구조 이렇게 잡고

├── starter.py
├── data
    └── 문서명.txt
└── secret.py

코드

import secret

import logging
import sys
import os.path
from llama_index.core import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    load_index_from_storage,
)
import openai

openai.api_key = secret.OPENAI_API_KEY  # secret 모듈에서 API 키 가져오기

# 내부 동작 확인용 로그
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))


# 기본 코드

# # 데이터 로드
# documents = SimpleDirectoryReader("data").load_data()
# # 인덱스 생성
# index = VectorStoreIndex.from_documents(documents)


# 인덱스 저장소 이용 - 시간/돈 절약하기
# 저장소가 이미 존재하는지 확인
PERSIST_DIR = "./storage"
if not os.path.exists(PERSIST_DIR):
    documents = SimpleDirectoryReader("data").load_data()
    index = VectorStoreIndex.from_documents(documents)
    index.storage_context.persist(persist_dir=PERSIST_DIR)
else:
    # 기존 인덱스를 로드
    storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
    index = load_index_from_storage(storage_context)


# 인덱스에 대한 질문 응답 엔진 생성
query_engine = index.as_query_engine()
# 질문하기
response = query_engine.query("정은채의 독서 목표에 대해 알려줘")
print(response)

> Adding chunk: > 청크를 저장하고 있다

저장소도 만들어짐

secret에 openAI API 넣고 그냥 불러오기만 하면 된다. 랭체인처럼 뭔가 따로 모델을 불러와서 검색기랑 저장소랑 체인으로 걸 필요가 없다고 한다

llama_index 라이브러리는 자동으로 OpenAI 모델을 설정할 수 있으므로, 모델을 직접 설정할 필요는 없습니다

🦙 결과

Q.
정은채의 독서 목표에 대해 알려줘

A.
정은채의 독서 목표는 '왜 나는 너를 사랑하는가 - 알랭 드 보통', '군주론 - 마키아벨리', '타인의 고통 - 수전 손택', '다정한 것이 살아남는다 - 어..누구더라' 책들을 아직 읽지 않았으며, '참을 수 없는 존재의 가벼움 - 밀란 쿤데라'와 '사양 - 다자이 오사무' 책들은 이미 완독한 것으로 나타납니다.

🦙 과정

DEBUG:llama_index.core.readers.file.base:> [SimpleDirectoryReader] Total files added: 1

SimpleDirectoryReader가 데이터 디렉토리에서 파일을 읽어들여 총 1개의 파일을 추가했다는 메시지

DEBUG:fsspec.local:open file: /Users/jung-eunchae/Desktop/code/LlamaIndex/StarterTutorial/data/goldchaeBucketList.txt

지정된 경로에서 파일을 열고 있다는 메시지

DEBUG:llama_index.core.node_parser.node_utils:> Adding chunk: # 정은채의 2024-2분기 (1학기) 버킷리스트 ...

파일의 내용을 청크 단위로 추가하고 있다는 메시지

(각 청크는 인덱스를 생성하기 위해 나누어진 데이터의 일부)

DEBUG:openai._base_client:Sending HTTP Request: POST https://api.openai.com/v1/embeddings

OpenAI API에 임베딩 요청을 보낼 때 사용한 요청 옵션

DEBUG:httpcore.connection:connect_tcp.started host='api.openai.com' port=443 local_address=None timeout=60.0 socket_options=None

OpenAI API에 HTTP POST 요청을 보내고 있다는 메시지

DEBUG:openai._base_client:Request options: {'method': 'post', 'url': '/embeddings', 'files': None, 'post_parser': <function Embeddings.create.<locals>.parser at 0x138664280>, 'json_data': {'input': ['file_path: /Users/jung-eunchae/Desktop/code/LlamaIndex/StarterTutorial/data/goldchaeBucketList.txt ...'], 'model': 'text-embedding-ada-002', 'encoding_format': 'base64'}}

API 서버와의 TCP 연결을 시작하고 있다는 메시지

DEBUG:httpcore.connection:connect_tcp.complete return_value=<httpcore._backends.sync.SyncStream object at 0x11fe99ee0>

TCP 연결이 완료되었음

DEBUG:httpcore.connection:start_tls.started ssl_context=<ssl.SSLContext object at 0x11fe80840> server_hostname='api.openai.com' timeout=60.0
DEBUG:httpcore.connection:start_tls.complete return_value=<httpcore._backends.sync.SyncStream object at 0x11feaf0a0>

SSL/TLS 설정을 시작하고 완료되었음

DEBUG:httpcore.http11:send_request_headers.started request=<Request [b'POST']>
DEBUG:httpcore.http11:send_request_headers.complete
DEBUG:httpcore.http11:send_request_body.started request=<Request [b'POST']>
DEBUG:httpcore.http11:send_request_body.complete

HTTP 요청 헤더와 본문을 전송하고 있다는 메시지

DEBUG:httpcore.http11:receive_response_headers.started request=<Request [b'POST']>
DEBUG:httpcore.http11:receive_response_headers.complete return_value=(b'HTTP/1.1', 200, b'OK', ...)
DEBUG:httpcore.http11:receive_response_body.started request=<Request [b'POST']>
DEBUG:httpcore.http11:receive_response_body.complete
DEBUG:httpcore.http11:response_closed.started
DEBUG:httpcore.http11:response_closed.complete
DEBUG:openai._base_client:HTTP Response: POST https://api.openai.com/v1/embeddings "200 OK" Headers([('date', 'Fri, 17 May 2024 15:33:46 GMT'), ('content-type', 'application/json'), ...])

API 서버로부터 HTTP 응답을 수신하고 완료되었음을 나타냄

응답 상태는 200 OK로 성공적인 요청임을 나타냄

DEBUG:fsspec.local:open file: /Users/jung-eunchae/Desktop/code/LlamaIndex/StarterTutorial/storage/docstore.json
DEBUG:fsspec.local:open file: /Users/jung-eunchae/Desktop/code/LlamaIndex/StarterTutorial/storage/index_store.json
DEBUG:fsspec.local:open file: /Users/jung-eunchae/Desktop/code/LlamaIndex/StarterTutorial/storage/graph_store.json
DEBUG:fsspec.local:open file: /Users/jung-eunchae/Desktop/code/LlamaIndex/StarterTutorial/storage/default__vector_store.json
DEBUG:fsspec.local:open file: /Users/jung-eunchae/Desktop/code/LlamaIndex/StarterTutorial/storage/image__vector_store.json

인덱스를 저장하는 파일을 열고 있다는 메시지

이는 인덱스 데이터를 로컬 디스크에 저장하여 나중에 재사용할 수 있게 함

DEBUG:openai._base_client:Request options: {'method': 'post', 'url': '/chat/completions', 'files': None, 'json_data': {'messages': [{'role': 'system', 'content': "You are an expert Q&A system that is trusted around the world.\nAlways answer the query using the provided context information, and not prior knowledge.\nSome rules to follow:\n1. Never directly reference the given context in your answer.\n2. Avoid statements like 'Based on the context, ...' or 'The context information ...' or anything along those lines."}, {'role': 'user', 'content': 'Context information is below.\n---------------------\nfile_path: /Users/jung-eunchae/Desktop/code/LlamaIndex/StarterTutorial/data/goldchaeBucketList.txt\n...Query: 정은채의 독서 목표에 대해 알려줘\nAnswer: '}], 'model': 'gpt-3.5-turbo', 'stream': False, 'temperature': 0.1}}
DEBUG:openai._base_client:Sending HTTP Request: POST https://api.openai.com/v1/chat/completions
...
DEBUG:openai._base_client:HTTP Response: POST https://api.openai.com/v1/chat/completions "200 OK" Headers([('date', 'Fri, 17 May 2024 15:33:50 GMT'), ('content-type', 'application/json'), ...])

질문 응답을 위해 OpenAI Chat API에 요청을 보내고 응답을 수신하고 있다는 메시지

여기서 model은 gpt-3.5-turbo를 사용하고 있음

난

랭체인이좋은것같아

저작자표시

'🤖 AI > AI' 카테고리의 다른 글

🔬LLM 지도 : 딥러닝과 언어 모델링 (0)	2024.09.23
태 대학 보내기 - 📐 embedding (0)	2024.06.15
🌷구축형 AI 환경 세팅하기2 🌷 (2)	2024.05.16
🦙 LlamaIndex - RAG 정리 (0)	2024.05.10
🦜langchain - RAG 정리 (2)	2024.05.10

은체공부

🦙LlamaIndex 써보기

🦙TODO

🦙박사님 깃허브 코드 보기

🦙 공식문서 보고 기본 세팅해보기 + 뭐든 한번 돌려보기

Context Augmentation 검색 증강

🦙 코딩

🦙 결과

🦙 과정

'🤖 AI > AI' 카테고리의 다른 글

티스토리툴바

🦙LlamaIndex 써보기

🦙TODO

🦙박사님 깃허브 코드 보기

🦙 공식문서 보고 기본 세팅해보기 + 뭐든 한번 돌려보기

Context Augmentation 검색 증강

🦙 코딩

🦙 결과

🦙 과정

'🤖 AI > AI' 카테고리의 다른 글

관련글

티스토리툴바