A few weeks ago, I created one of my own projects, which focuses on enhancing AI use for PDF files. Basically, I used RAG ( Retrieval-Augmented Generation) to make the LLM using the data only from the imported PDF files as pieces of information for your question. Because of how general it is, it is really hard to generate a specific prompt for this problem. I spent almost a whole month and finally had my choice of prompt. Below is the prompt that I used:
“””You are an AI assistant tasked with providing detailed answers based solely on the given context. Your goal is to analyze the information provided and formulate a comprehensive, well-structured response to the question.
context will be passed as “Context:”
user question will be passed as “Question:”
To answer the question:
- Thoroughly analyze the context, identifying key information relevant to the question.
- Organize your thoughts and plan your response to ensure a logical flow of information.
- Formulate a detailed answer that directly addresses the question, using only the information provided in the context.
- Ensure your answer is comprehensive, covering all relevant aspects found in the context.
- If the context doesn’t contain sufficient information to fully answer the question, state this clearly in your response.
- Use clear, concise language.
- Organize your answer into paragraphs for readability.
- Use bullet points or numbered lists where appropriate to break down complex information.
- If relevant, include any headings or subheadings to structure your response.
- Ensure proper grammar, punctuation, and spelling throughout your answer.
Important: Base your entire response solely on the information provided in the context. Do not include any external knowledge or assumptions not present in the given text.””””
Because my AI does not generate answers based on a specific topic or field of study, I figured I could extract data from the PDF file as chunks of information and then store it into my data retrieval. To add more depth to my answer generation, I provided a list of questions in my prompt for the LLM so that it could also auto-generate more chunks of data from the previous data. To prevent hallucination, I told my LLM not to involve any external source: “Base your entire response solely on the information provided in the context. Do not include any external knowledge or assumptions not present in the given text.”
To ensure that the LLM knows the path of instructions to follow, I added in some of my specific commands so that the generated output would stick to what I wanted:
“””
Use clear, concise language.
Organize your answer into paragraphs for readability.
Use bullet points or numbered lists where appropriate to break down complex information.
If relevant, include any headings or subheadings to structure your response.
Ensure proper grammar, punctuation, and spelling throughout your answer.
“””
This refers to the “Cognitive Verified Pattern” which is mentioned on page 10 of the White et al. article. This ensures that the LLm would follow some of the specific requirements from the user.
This is quite an interesting experiment that can minimize possible information errors. When I was trying to summarize our reading article from Perusall by Arriagada on ChatGPT, it provided information that I couldn’t find anywhere from the article, which I believe all the information came from a random source. This prompt is a good idea to make AI focus on discussing the information that we provide.
This is a very cool way to use LLMs. AI is excellent at the idea of “hallucination” and creating things that don’t exist. Many times I have tried a similar prompt such as this and it either worked how I wanted or gave me the incorrect information. The idea of training AI to give the results that you want is a very important part of learning how to prompt.