πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Prompt Engineering

Advanced NLPPrompt Engineering Techniques🟒 Free Lesson

Advertisement

Prompt Engineering

Prompt engineering is the art and science of designing inputs that guide large language models toward desired outputs. Effective prompts dramatically improve model performance without any parameter updates.

Why Prompt Engineering Matters

The same model can produce vastly different outputs depending on how it is prompted. Prompt engineering bridges the gap between model capability and user intent.

ApproachDescriptionCostPerformance
Zero-shotNo examples providedLowestBaseline
One-shotSingle exampleLowImproved
Few-shotMultiple examplesModerateGood
Chain-of-thoughtStep-by-step reasoningModerateExcellent
Self-consistencyMultiple reasoning pathsHighBest

Prompt Taxonomy


Zero-Shot Prompting

Zero-shot prompting relies entirely on the model's pre-trained knowledge with no task-specific examples.

# Zero-shot classification
def zero_shot_classify(text, labels, model):
    prompt = f"""Classify the following text into one of these categories: {', '.join(labels)}.

Text: {text}

Category:"""

    response = model.generate(prompt)
    return response.strip()

# Zero-shot sentiment analysis
text = "The movie had stunning visuals but a predictable plot."
labels = ["positive", "negative", "neutral"]

result = zero_shot_classify(text, labels, llm)
print(result)  # Expected: "neutral"

Zero-Shot Effectiveness: The probability of correct zero-shot prediction scales with:

P(correct)∝∣Dpretrain∩Dtask∣∣Dtask∣P(\text{correct}) \propto \frac{|D_{\text{pretrain}} \cap D_{\text{task}}|}{|D_{\text{task}}|}

where DpretrainD_{\text{pretrain}} is the pre-training data distribution and DtaskD_{\text{task}} is the target task distribution.


Few-Shot Prompting

Few-shot prompting provides task examples that demonstrate the desired input-output mapping.

# Few-shot prompt template
def create_few_shot_prompt(examples, query, task_description=""):
    """
    Construct a few-shot prompt with task description and examples.

    Args:
        examples: list of (input, output) tuples
        query: the target input to classify/generate
        task_description: optional task explanation
    """
    prompt = ""
    if task_description:
        prompt += f"{task_description}\n\n"

    for i, (inp, out) in enumerate(examples, 1):
        prompt += f"Example {i}:\nInput: {inp}\nOutput: {out}\n\n"

    prompt += f"Input: {query}\nOutput:"
    return prompt

# Sentiment analysis examples
examples = [
    ("This product is amazing! Best purchase ever.", "Positive"),
    ("Terrible quality. Broke after one day.", "Negative"),
    ("It works as described. Nothing special.", "Neutral"),
]

query = "Not bad, but I expected better for the price."
prompt = create_few_shot_prompt(
    examples, query,
    task_description="Classify each review as Positive, Negative, or Neutral."
)
print(prompt)

Example Selection Strategies

StrategyDescriptionUse Case
Random samplingRandomly select from training dataGeneral purpose
StratifiedBalanced representation of classesClassification
Semantic similaritySelect examples similar to queryDomain-specific
Diversity-basedMaximize coverage of input spaceComplex tasks
Hard negativesInclude challenging examplesEdge cases
import numpy as np
from sentence_transformers import SentenceTransformer

class FewShotExampleSelector:
    def __init__(self, examples, model_name="all-MiniLM-L6-v2"):
        self.examples = examples
        self.encoder = SentenceTransformer(model_name)
        self.embeddings = self.encoder.encode([ex[0] for ex in examples])

    def select(self, query, k=4, strategy="similarity"):
        query_embedding = self.encoder.encode([query])

        if strategy == "similarity":
            # Select most similar examples
            similarities = np.dot(self.embeddings, query_embedding.T).flatten()
            indices = np.argsort(similarities)[-k:][::-1]

        elif strategy == "diversity":
            # Maximal marginal relevance for diversity
            selected = []
            candidates = list(range(len(self.examples)))

            for _ in range(k):
                best_idx = None
                best_score = -np.inf
                for idx in candidates:
                    sim_to_query = np.dot(self.embeddings[idx], query_embedding.T).item()
                    sim_to_selected = max(
                        [np.dot(self.embeddings[idx], self.embeddings[s]).item() for s in selected]
                    ) if selected else 0
                    score = sim_to_query - 0.5 * sim_to_selected
                    if score > best_score:
                        best_score = score
                        best_idx = idx
                selected.append(best_idx)
                candidates.remove(best_idx)
            indices = selected

        return [self.examples[i] for i in indices]

# Usage
selector = FewShotExampleSelector(training_examples)
selected = selector.select("This phone has an incredible camera!", k=3)

Chain-of-Thought (CoT) Prompting

CoT prompting elicits step-by-step reasoning, dramatically improving performance on complex tasks.

DfCoT Reasoning Process

Given input xx, the model generates a reasoning chain r=(r1,r2,…,rk)r = (r_1, r_2, \ldots, r_k) before producing the final answer aa:

P(a∣x)=βˆ‘rP(a∣x,r)β‹…P(r∣x)P(a | x) = \sum_{r} P(a | x, r) \cdot P(r | x)

where P(r∣x)P(r | x) is the probability of the reasoning chain given the input.

# Standard prompt vs Chain-of-Thought
standard_prompt = """Question: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?

Answer: 11"""

cot_prompt = """Question: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?

Let's think step by step:
1. Roger starts with 5 tennis balls.
2. He buys 2 cans, each containing 3 tennis balls.
3. Total new balls: 2 Γ— 3 = 6 tennis balls.
4. Total balls: 5 + 6 = 11 tennis balls.

Answer: 11"""

Zero-Shot CoT

def zero_shot_cot(question, model):
    """Elicit reasoning with 'Let's think step by step'."""

    # Step 1: Generate reasoning
    reasoning_prompt = f"""Question: {question}

Let's think step by step:"""

    reasoning = model.generate(reasoning_prompt)

    # Step 2: Extract answer
    answer_prompt = f"""Question: {question}

{reasoning}

Therefore, the answer is:"""

    answer = model.generate(answer_prompt)
    return {"reasoning": reasoning, "answer": answer}

question = "A train travels 60 mph for 2.5 hours. How far does it travel?"
result = zero_shot_cot(question, llm)
print(f"Reasoning: {result['reasoning']}")
print(f"Answer: {result['answer']}")

Self-Consistency

Self-consistency generates multiple reasoning paths and selects the most common answer through majority voting.

DfSelf-Consistency Decoding

Generate MM reasoning paths {r(1),r(2),…,r(M)}\{r^{(1)}, r^{(2)}, \ldots, r^{(M)}\}, each producing answer a(i)a^{(i)}. The final answer is:

aβˆ—=arg⁑max⁑aβˆ‘i=1M1[a(i)=a]a^* = \arg\max_{a} \sum_{i=1}^{M} \mathbb{1}[a^{(i)} = a]

With temperature sampling T>0T > 0 to encourage diverse reasoning paths.

import torch
from collections import Counter

class SelfConsistencyDecoder:
    def __init__(self, model, tokenizer, num_paths=5, temperature=0.7):
        self.model = model
        self.tokenizer = tokenizer
        self.num_paths = num_paths
        self.temperature = temperature

    def decode(self, prompt):
        """Generate multiple reasoning paths and vote on answer."""
        answers = []
        reasoning_paths = []

        for _ in range(self.num_paths):
            # Generate with temperature for diversity
            input_ids = self.tokenizer.encode(prompt, return_tensors="pt")

            output = self.model.generate(
                input_ids,
                max_new_tokens=256,
                temperature=self.temperature,
                top_p=0.9,
                do_sample=True
            )

            response = self.tokenizer.decode(output[0], skip_special_tokens=True)
            answer = self.extract_answer(response)

            answers.append(answer)
            reasoning_paths.append(response)

        # Majority voting
        vote_counts = Counter(answers)
        final_answer = vote_counts.most_common(1)[0][0]
        confidence = vote_counts[final_answer] / self.num_paths

        return {
            "answer": final_answer,
            "confidence": confidence,
            "vote_distribution": dict(vote_counts),
            "reasoning_paths": reasoning_paths
        }

    def extract_answer(self, response):
        """Extract the final answer from the response."""
        lines = response.strip().split("\n")
        for line in reversed(lines):
            if "answer" in line.lower():
                # Extract text after the last colon or "is"
                if ":" in line:
                    return line.split(":")[-1].strip()
                elif "is" in line:
                    return line.split("is")[-1].strip()
        return lines[-1].strip()

ReAct (Reasoning + Acting)

ReAct interleaves reasoning traces with actions, enabling models to interact with external tools.

# ReAct prompt template
REACT_TEMPLATE = """Answer the following questions as best you can. You have access to the following tools:

{tools}

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: {question}
Thought:"""

# Example ReAct interaction
react_example = """
Question: What is the capital of the country where the Eiffel Tower is located?
Thought: I need to find where the Eiffel Tower is located first.
Action: search
Action Input: Eiffel Tower location
Observation: The Eiffel Tower is located in Paris, France.
Thought: Now I know the country is France. I need to find the capital of France.
Action: search
Action Input: capital of France
Observation: The capital of France is Paris.
Thought: I now know the final answer.
Final Answer: Paris
"""

Prompt Optimization

DfAutomatic Prompt Optimization

Given a set of training examples D={(xi,yi)}\mathcal{D} = \{(x_i, y_i)\}, optimize prompt pβˆ—p^* to maximize:

pβˆ—=arg⁑max⁑p∈Pβˆ‘(xi,yi)∈Dlog⁑P(yi∣p,xi)p^* = \arg\max_{p \in \mathcal{P}} \sum_{(x_i, y_i) \in \mathcal{D}} \log P(y_i | p, x_i)

where P\mathcal{P} is the space of possible prompts.

from typing import List, Dict
import random

class PromptOptimizer:
    """Simple prompt optimizer using beam search over prompt components."""

    def __init__(self, model, eval_data):
        self.model = model
        self.eval_data = eval_data
        self.templates = [
            "Classify: {input}\nCategory:",
            "What category does this belong to?\n{input}\nAnswer:",
            "Task: Classify the following text.\nText: {input}\nClass:",
        ]
        self.demonstration_sets = [...]  # Pre-generated

    def optimize(self, num_rounds=5, beam_width=3):
        """Find optimal prompt components."""
        best_score = 0
        best_template = None
        best_demos = None

        for template in self.templates:
            for demo_set in self.demonstration_sets:
                score = self.evaluate(template, demo_set)
                if score > best_score:
                    best_score = score
                    best_template = template
                    best_demos = demo_set

        return {
            "template": best_template,
            "demonstrations": best_demos,
            "score": best_score
        }

    def evaluate(self, template, demos, k=50):
        """Evaluate prompt on held-out data."""
        correct = 0
        for x, y in self.eval_data[:k]:
            prompt = self.format_prompt(template, demos, x)
            pred = self.model.generate(prompt)
            if self.match(pred, y):
                correct += 1
        return correct / min(k, len(self.eval_data))

Best Practices Summary

PrincipleDescription
Be specificClear, unambiguous instructions
Provide examplesFew-shot demonstrations help
Structure outputDefine expected format explicitly
Use delimitersSeparate instructions from content
IterateTest and refine prompts empirically
Chain reasoningCoT for complex multi-step tasks
Self-consistencyVote across multiple reasoning paths

Key Takeaways

  • Zero-shot works well when the model has strong task priors
  • Few-shot examples should be diverse and representative
  • Chain-of-thought dramatically improves reasoning tasks
  • Self-consistency improves reliability through majority voting
  • ReAct enables tool use and grounded reasoning
  • Always test prompts systematically on held-out examples
⭐

Premium Content

Prompt Engineering

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert NLP Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement