Skip to main content

Command Palette

Search for a command to run...

“How AI APIs Work: Response Types, Hallucination, and How to Handle Them”

Updated
4 min read
“How AI APIs Work: Response Types, Hallucination, and How to Handle Them”

What Are AI APIs?

AI APIs are cloud-based interfaces that allow us to access and use artificial intelligence models — like language, vision, or speech models — without needing to train or host them on their own servers.

Think of AI APIs as a teacher who knows it all and can reply with text, images, voice, and much more — all instantly. And the best part? We don’t need to know how they do it; we just ask and get the answer.

Where they’re used

AI APIs are everywhere! From chatbots that talk like humans, to tools that write your blogs or ads for you. They power virtual assistants, create stunning images and lifelike voices, tutor students with personalized lessons, and even help developers write and fix code — all in the blink of an eye.

Nowadays, almost every web or mobile app is evolving by integrating AI APIs to stay ahead with today’s technology. Developers use AI APIs to add smart features like chatbots that understand natural language, content generators that write automatically, image and voice creators, personalized learning assistants, and code helpers. These APIs let apps offer powerful AI capabilities without building complex models from scratch.

Types of Responses from AI APIs

1. Text Completion

This is the classic mode, where you send a prompt and get a plain text reply.

from google import genai

# Initialize Gemini model
model = genai.GenerativeModel("gemini-1.5-pro")

# Send prompt
response = model.generate_content("Write a short poem about the moon.")

# Print plain text reply
print("Plain text reply:")
print(response.text.strip())

2. Function Calling / Tool Use

Some AI APIs support structured responses where the AI can return JSON objects to trigger actions or call functions in your app.

from google import genai
import json

# Initialize Gemini model
model = genai.GenerativeModel("gemini-1.5-pro")

# Send prompt requesting structured response
response = model.generate_content(
    "Return JSON to call a function named 'add_numbers' with arguments a=5 and b=7. No explanation."
)

# Parse text as JSON
result_json = json.loads(response.text.strip())

# Print the structured result
print("Simulated function call JSON:")
print(json.dumps(result_json, indent=2))

# Example: You could then trigger your add_numbers function with this data.

3. Structured Output

You ask the model to return structured data (usually JSON) based on pattern or schema-but its just data no action

def generate_content(prompt: str):
    load_dotenv()
    api_key = os.getenv("GEMINI_API_KEY")
    client = genai.Client(api_key=api_key)
    response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents=prompt,
    config={
        "response_mime_type": "application/json",
        "response_schema": CourseResponse, #Here CourseResponse pydantic model
    },)
    try:
        parsed = json.loads(response.text)
        return CourseResponse(**parsed)

Why Do Hallucinations Happen in AI APIs?

Hallucination refers to when an AI confidently produces false, misleading, or completely made-up responses. This can be a major issue in real-world applications.

Examples of Hallucination

  1. Fake Citations

“According to Smith 2022 in The AI Guidebook…” (book doesn’t exist)

  1. Bad Code

Returns Python code that imports a nonexistent module

  1. Wrong Math

Calculates 23 × 5 = 135 (because it "sounded right")

Why Do AI Models Hallucinate?

AI models sometimes produce false or made-up information, which is known as hallucination. This happens for several reasons:

  1. They predict text, not facts.
    Language models are trained to predict the next word based on patterns, not verify truth.

  2. Vague or unclear prompts.
    If your input is too short or missing detail, the model fills in the blanks with guesses.

  3. Outdated or limited knowledge.
    Many models don’t have real-time access to the internet, so their knowledge may be outdated.

  4. No access to tools.
    Without plugins or API access, the model can't look up facts, calculate, or fetch real data.

  5. High temperature settings.
    A higher temperature makes responses more creative, but also increases the chance of hallucination.

What is Temperature in AI?

Temperature is a setting that controls how creative or random the AI’s responses are.

  • Low temperature (e.g., 0–0.3):
    The AI gives more focused, predictable, and accurate answers. Good for facts and code.

  • High temperature (e.g., 0.7–1.0):
    The AI becomes more creative, diverse, and expressive. Good for storytelling or brainstorming — but more likely to make things up (hallucinate).

How to Handle and Reduce Hallucination

  1. Write clear prompts
    Be specific. The more detail you give, the less the model needs to guess.

  2. Use system messages
    Set the model’s role, like: “You are a helpful assistant. Only respond with facts.”

  3. Validate responses
    Check outputs before using them — especially for structured data. Use tools like json.loads() or Pydantic.

  4. Post-process results
    Add logic to catch and filter out fake data, bad links, or unsupported answers.

  5. Use function calling or structured output
    Instead of asking for text, have the model return specific arguments that your system understands.

  6. Lower the temperature
    Set temperature=0 for more accurate, fact-based responses.

  7. Use real-time tools or APIs
    Connect the AI to your own tools, databases, or search functions to avoid guesses.

  8. Have a human in the loop
    For critical use cases, let a human review the AI’s response before it’s published or used.