RAG : Retrieval Augmented Generation


Comment resoudre:

Idée:

Ajouter la bonne info dans le prompt

Principe:

<Question>

prends en compte le texte ci-dessous pour répondre

<Information, retrieved>

Donc super simple.

La difficulté n’est pas dans le prompt mais dans la partie retrieval

Comment indentifier les documents relatifs a la question / query


Premiere approche

on part d’un corpus de document

L’utilisateur pose sa question : la query

distance entre deux textes vectorisés : cosine similarity

A ce stade on a 2 questions non résolues

Chunk

voir mon article sur le sujet

On peut jouer sur

Vectorisation

Il y a de nombreux modeles de vectorisation disponibles

voici la strategie de choix selon Mistral

Les points 1, 2, 3 et 7 me semble importants


Stratégie pour choisir un modèle d’embeddings en RAG

1. Langues

2. Taille des chunks

3. Open-source vs Propriétaire

4. Installation locale vs API

5. Performances

6. Adaptation au domaine

7. Outils d’évaluation

À retenir :


Merci Mistral

Nouveau element ; comment evaluer les resultats ?

Mais reste la question

Combien de documents inclure dans le prompt

100% empiririque

Reranking

Le systeme retourne N documents similaires en contenu a la question de depart a partir d’un modele de vectorisation censé conserver la semantique des documents. ces modeles sont generiques

Global Embedding Models (Fine-Tuned for Semantic Similarity)

Output: A vector representation (embedding) for each text input. The dimensionality of this vector is fixed (e.g., 768 dimensions). This vector is meant to represent the overall semantic meaning of the text. Efficiency: Optimized for fast encoding, as it needs to process the entire corpus and encode queries in real-time. Global Semantic Space: Aims to create a consistent and meaningful global representation of semantic relationships across all texts.

Similarity Metric: Relies on a distance metric (e.g., cosine similarity, dot product) to compare the embeddings and determine similarity.

Reranking Models (Fine-Tuned for Semantic Similarity), including Cross-Encoders

Technical Goal: To directly assess the semantic relatedness between a specific query and a specific document.

The model learns to predict a relevance score for each query-document pair.

Loss functions like mean squared error (MSE) or cross-entropy loss are used to train the model to accurately predict these relevance scores.

Fine-Tuning Approach (Cross-Encoders):

Architecture: A cross-encoder takes both the query and the document as input simultaneously. The query and document are fed into the same transformer network (e.g., BERT, RoBERTa) and processed together. This allows the model to directly attend to the interactions between the query and the document.

Joint Encoding (Cross-Encoders): The query and document are processed together, allowing the model to capture fine-grained interactions and contextual relationships.

Higher Accuracy, Higher Cost: Generally more accurate than embedding models for relevance ranking but also significantly more computationally expensive. They are not suitable for encoding the entire corpus.

recap

Evaluation avec RAGAS

Vector Store

Comment matcher 2 vecteurs le plus vite possible

Calculer le cosine sim de la query vs tous les documents => prends trop de temps