Evaluating AI Systems: Testing LLMs, RAG, and Agents Kindle Edition

★★★★★ 4.5 39 reviews

US$4.00
Price when purchased online
Free shipping Free 30-day returns

Sold and shipped by asiannetworkunlimited.com
We aim to show you accurate product information. Manufacturers, suppliers and others provide what you see here.
US$4.00
Price when purchased online
Free shipping Free 30-day returns

How do you want your item?
You get 30 days free! Choose a plan at checkout.
Shipping
Arrives May 25
Free
Pickup
Check nearby
Delivery
Not available

Sold and shipped by asiannetworkunlimited.com
Free 30-day returns Details

Product details

Management number 220491396 Release Date 2026/05/03 List Price US$4.00 Model Number 220491396
Category

The definitive guide to testing AI systems that actually work.Most AI systems ship without meaningful evaluation. Teams eyeball a few responses, declare the system "good enough," and push to production. Then quality degrades, hallucinations appear, and nobody knows why.Evaluating AI Systems is a practical, technical guide to building evaluation frameworks for LLMs, RAG pipelines, and AI agents. Written by Alex Merced, Head of Developer Relations at Dremio and author of multiple technical books, it covers the full evaluation lifecycle from dataset generation to production monitoring.What you will learn:Understand why traditional software testing fails for AI and what to do insteadBuild golden evaluation datasets that accurately measure system qualityImplement prompt testing with tools like DeepEval, RAGAS, and promptfooDesign evaluation metrics for correctness, faithfulness, relevance, and safetyDetect and measure hallucinations with automated pipelinesUse LLM-as-judge patterns with bias mitigation and multi-model consensusBuild regression testing that catches quality degradation before users doDeploy production monitoring with drift detection and quality alertingEvaluate multi-step agent workflows with tool use accuracy metricsManage evaluation costs with tiered strategies from smoke tests to deep expert reviewsWritten with verified specifications for GPT-5.4, Claude Sonnet 4.6, and Gemini 3.1 Pro throughout. Every technique is immediately applicable to production AI systems.For AI engineers: Build evaluation pipelines that prevent quality incidents.For QA engineers: Apply testing discipline to the most untestable systems you have ever worked with.For engineering managers: Make informed quality decisions with data, not gut feeling. Read more

XRay Not Enabled
Edition 1st
Language English
File size 10.3 MB
Page Flip Enabled
Publisher Alex Merced Books
Word Wise Not Enabled
Print length 372 pages
Accessibility Learn more
Screen Reader Supported
Publication date March 16, 2026
Enhanced typesetting Enabled

Correction of product information

If you notice any omissions or errors in the product information on this page, please use the correction request form below.

Correction Request Form

Customer ratings & reviews

4.5 out of 5
★★★★★
39 ratings | 16 reviews
How item rating is calculated
View all reviews
5 stars
83% (32)
4 stars
4% (2)
3 stars
2% (1)
2 stars
1% (0)
1 star
10% (4)
Sort by

There are currently no written reviews for this product.