Question 1

What is RAG and when should I use it?

Accepted Answer

RAG, or retrieval-augmented generation, is a pattern where you fetch relevant documents from your own data and hand them to a language model as context before it answers. Use it when the model needs to know things it was never trained on: your product docs, support tickets, contracts, or anything private. It is the right default for question-answering and search over a body of content you control, because it keeps answers grounded and lets you cite sources.

Question 2

RAG vs fine-tuning, which one do I need?

Accepted Answer

RAG changes what the model knows; fine-tuning changes how the model behaves. If you need the model to answer from facts that change often, reach for RAG, because you can update the data without retraining anything. Fine-tune when you need a consistent format, tone, or a narrow skill the base model keeps getting wrong. Most teams start with RAG plus good prompting, and only fine-tune once they have evidence that prompting alone cannot get them there.

Question 3

Do I even need a custom AI build, or is a SaaS tool enough?

Accepted Answer

If an off-the-shelf tool already does what you need, buy it. Custom work pays off when the AI has to sit inside your own data, your own workflows, or your own product, and no vendor covers that exact shape. A good test: if you would be handing a third party your most sensitive data or your core user experience, that is usually where a custom build earns its keep.

Question 4

What is the difference between an LLM and an AI agent?

Accepted Answer

An LLM is a model that takes text in and produces text out. An agent wraps a model in a loop that lets it call tools, read results, and decide what to do next until a task is done. Plain LLM calls are great for single-shot tasks like summarizing or classifying. Agents make sense when the work needs several steps, external actions, and the model deciding the order.

Question 5

Can AI features be added to my existing product?

Accepted Answer

Yes, and that is the common case. Most AI features ship as a new endpoint or service that your current app calls, so the core product keeps running while the AI part is built and tested alongside it. We work inside your stack and your repos rather than asking you to rebuild around a new platform.

Question 6

Why does my RAG chatbot give wrong answers?

Accepted Answer

Usually the problem is retrieval, not the model. If the right chunk never makes it into the context, even the best model will guess. Common causes are crude fixed-size chunking that splits sentences, pure vector search that misses exact terms like part numbers, and no reranking to push the best result to the top. Fixing retrieval fixes most hallucinations. We cover this in our RAG development work at /services/rag-development-india.

Question 7

What is chunking and why does it matter?

Accepted Answer

Chunking is how you split your documents into pieces small enough to retrieve and feed to the model. It matters because a bad split can cut a table in half or separate a question from its answer, so the retrieved piece is useless even when it scores a match. Good chunking respects document structure: it keeps headings with their content, preserves tables and code, and overlaps a little so context does not get severed at the boundary.

Question 8

Vector search or keyword search, which is better?

Accepted Answer

Use both. Vector search captures meaning, so it finds a passage about 'cancelling a subscription' when the user typed 'how do I stop being billed.' Keyword search catches exact strings that vectors fluff over: SKUs, error codes, names. Hybrid retrieval fuses the two and reranks the combined results, which beats either one alone on real queries.

Question 9

How do I stop an AI from making things up?

Accepted Answer

Ground it and constrain it. Feed the model retrieved context and instruct it to answer only from that context, then have it say it does not know when the context is thin. Show citations so a wrong answer is easy to spot and trace. The honest part is that you cannot drive hallucination to zero; you drive it low enough to trust, and you measure it with evals so it does not creep back up.

Question 10

Is RAG a silver bullet for AI accuracy?

Accepted Answer

No. RAG fixes the 'the model does not know your data' problem, but it introduces a retrieval problem, and a wrong retrieval produces a confident wrong answer. It also does not help with tasks that are about reasoning or formatting rather than facts. RAG is a strong default for grounded question-answering, not a cure-all, and we are upfront about where it falls short.

Question 11

What is an AI agent?

Accepted Answer

An AI agent is a system where a language model runs in a loop: it decides on an action, calls a tool to do it, reads the result, and repeats until the task is finished. The tools might be a search, a database query, an API call, or running code. The model is the brain; the tools are the hands.

Question 12

When should I build an agent instead of a simple workflow?

Accepted Answer

Build an agent when the steps are not known in advance and depend on what the model finds along the way. If the path is fixed, a plain workflow with scripted steps is cheaper, faster, and easier to debug, so prefer it. Reserve agents for open-ended tasks like research, triage, or multi-step automation where flexibility is worth the extra complexity and cost.

Question 13

Are AI agents reliable enough for production?

Accepted Answer

They can be, with guardrails. Reliable agents have a bounded set of tools, limits on how many steps they take, checks on their outputs, and a human in the loop for anything risky or irreversible. The failure mode of an unconstrained agent is it loops, burns tokens, or takes an action you did not want. Production readiness comes from the engineering around the model, not the model alone.

Question 14

What is tool use or function calling?

Accepted Answer

Tool use is how a model takes action beyond producing text. You describe the tools available, such as 'search orders' or 'send email,' and the model returns a structured request to call one with specific arguments. Your code runs the tool, returns the result, and the model continues. It is the mechanism that turns a chat model into something that can actually do things.

Question 15

Which model should I use, Claude or GPT?

Accepted Answer

Both are strong; the right pick depends on the task, not the brand. We test the top candidates from Anthropic, OpenAI, Google, and the open-source field against your actual prompts and data, then choose on quality, latency, and cost for that job. It is also common to mix: a cheaper model for routine calls and a stronger one for the hard cases. The decision should come from your evals, not a leaderboard.

Question 16

Should we self-host an LLM?

Accepted Answer

For most teams, no, at least not at first. Hosted APIs are cheaper until you hit real scale, and they free you from running GPUs. Self-hosting earns its place when you have strict data-residency rules, very high steady volume, or a need to run a specific open model you have customized. We walk through the trade-offs at /blog/should-we-self-host-an-llm.

Question 17

How do I avoid getting locked into one AI vendor?

Accepted Answer

Put a thin abstraction between your app and the model provider, so swapping from one API to another is a config change, not a rewrite. Keep your prompts, evals, and data in your own systems rather than a vendor's tooling. The goal is that if pricing or quality shifts, you can move in days, not quarters.

Question 18

What is a context window and why does it matter?

Accepted Answer

The context window is how much text a model can consider at once, counting both your input and its output. It matters because everything the model knows for a given request, the instructions, the retrieved documents, the conversation, has to fit inside it. Bigger windows let you pass more context, but stuffing them full costs more and can actually hurt quality, so retrieving the right slice still beats dumping everything in.

Question 19

Do I need a vector database?

Accepted Answer

Only if you are doing retrieval at meaningful scale. For a small set of documents, a Postgres extension like pgvector is usually enough and keeps your stack simple. A dedicated vector database earns its place when you have millions of vectors, need fast filtered search, or want managed scaling. Pick based on your data size, not on what is fashionable.

Question 20

How much does an AI MVP cost?

Accepted Answer

A focused AI MVP typically lands in the low tens of thousands of dollars, depending on scope, data, and how much already exists. The cost drivers are integration with your systems, data quality, and how high the bar is for accuracy, not the model itself. We scope tightly so you pay for a working slice rather than an open-ended research project. More on this at /services/ai-mvp-development-india.

Question 21

How long does an AI build take?

Accepted Answer

A first working version is usually weeks, not months, when the scope is clear. The early weeks go to data, retrieval, and a thin end-to-end path you can actually try; the rest goes to evals and hardening. Long timelines almost always trace back to fuzzy scope or messy data, which is why we pin both down before we start.

Question 22

Why do my LLM costs keep climbing in production?

Accepted Answer

The usual culprits are oversized prompts, calling a premium model for tasks a cheaper one handles fine, retries that pile up, and no caching of repeated work. Costs scale with tokens, so trimming context and routing easy requests to smaller models often cuts the bill sharply without hurting quality. The fix is measurement: once you see where the tokens go, the savings are usually obvious.

Question 23

How do I estimate the budget for an AI feature?

Accepted Answer

Split it into build cost and running cost. Build cost is engineering time for integration, retrieval, and evals; running cost is tokens per request times your expected volume, plus any hosting. Estimate running cost early with a real prototype, because a feature that is cheap at a thousand users a day can be alarming at a million. We size both before committing so there are no surprises later.

Question 24

What does it cost to keep an AI feature running over time?

Accepted Answer

Plan for model usage, monitoring, and a steady trickle of maintenance. Models and prices change, your data drifts, and edge cases surface from real users, so a small ongoing investment keeps quality from sliding. Teams that budget only for the build and nothing after are the ones whose AI quietly degrades a few months in.

Question 25

What is an eval and why does it matter?

Accepted Answer

An eval is a test set for your AI: a collection of real inputs paired with what a good answer looks like, run automatically so you can score every change. It matters because AI behavior is not deterministic, so 'it looked fine when I tried it' is not evidence. Without evals you are shipping blind and reacting to complaints. With them, you know whether a prompt tweak or model swap made things better or worse before it reaches users.

Question 26

How do I measure AI quality in a way my board trusts?

Accepted Answer

Tie the metric to the outcome the business cares about, then track it on a fixed eval set over time. Accuracy on real questions, resolution rate, or escalation rate are the kinds of numbers leadership can act on, far more than a vibe demo. The credibility comes from measuring the same thing every release so the trend is honest. We dig into this at /blog/measuring-ai-quality-evals-your-board-trusts.

Question 27

How do you handle hallucinations in production?

Accepted Answer

We design for them rather than pretend they are gone. That means grounding answers in retrieved context, showing citations, letting the system decline when it is unsure, and running evals that catch regressions before release. For anything high-stakes, a human reviews the output. The aim is a system whose mistakes are rare, visible, and recoverable.

Question 28

When is AI the wrong tool for the job?

Accepted Answer

When a deterministic rule or a simple lookup would do the job perfectly, AI just adds cost, latency, and uncertainty. It is also the wrong tool when you cannot tolerate any error and have no way to review outputs, or when you do not have data to ground or evaluate it. Part of doing this well is telling you when not to use AI, and we will.

Question 29

Where is FoundrySoft based?

Accepted Answer

We are an India-based studio of senior engineers, working primarily with US companies. Our team builds production software and AI systems, and we operate with substantial daily overlap with US business hours so collaboration feels close rather than offshore.

Question 30

How does an engagement with FoundrySoft work?

Accepted Answer

We start by scoping the problem and the data, agree on a clear, fixed slice of work, and ship a working version in weeks. You work directly with the senior engineers building it, not a layer of account managers. From there we iterate against evals and your feedback rather than a fixed spec written before anyone understood the problem.

Question 31

Do you have enough time-zone overlap with US teams?

Accepted Answer

Yes. We structure our day for roughly nine or more hours of overlap with US business hours, so standups, reviews, and quick questions happen in real time rather than over a 24-hour delay. The point is that working with us should feel like working with a team a few time zones away, not on the other side of a wall.

Question 32

Can I own the IP?

Accepted Answer

Yes, you own all of it. The code, the prompts, the data, and the models we build for you are yours, and that is in writing from the start. We sign NDAs up front and hand over everything, because a build you cannot fully own and run yourself is not much of an asset.

Question 33

How do you handle data privacy and security?

Accepted Answer

We sign NDAs before we see anything sensitive, work within your accounts and infrastructure where possible, and keep your data out of any model-provider training by using the right API settings and agreements. For regulated or high-sensitivity data we will architect around your residency and access rules. Security is a design constraint we plan for, not a checkbox at the end.

Question 34

How do I evaluate an AI development partner?

Accepted Answer

Look for engineers who can show shipped, production AI rather than demos, who talk about evals and failure modes instead of only the happy path, and who will tell you when AI is the wrong call. Ask how they measure quality, who owns the IP, and what happens after launch. We wrote a full guide on this at /blog/how-to-evaluate-ai-development-partner.

Questions about AI development, answered.

Getting started