AI Ticket Triage System

Overview

Customer support teams at large organizations are often overwhelmed by the sheer volume of incoming tickets. The AI Ticket Triage System was built to automatically categorize, prioritize, and generate suggested responses for incoming customer queries, dramatically reducing the time human agents spend on routine triage.

The Problem

Before this system, the customer support team at RMT Engineering spent approximately 40% of their time just reading and routing tickets to the correct department. High-priority issues were sometimes buried under a mountain of low-priority feature requests, leading to missed SLAs and frustrated customers.

Architecture

The system was designed with a decoupled architecture to ensure the AI processing didn't block the core ticket management functionality.

Frontend: Next.js App Router for a responsive, real-time agent dashboard.
Backend: FastAPI for high-performance API endpoints.
Background Processing: Celery and Redis to handle the asynchronous LLM calls.
Database: PostgreSQL for persistent storage and pgvector for semantic search.

Technical Challenges

The biggest challenge was latency. Calling OpenAI's API synchronously would cause the ticket creation endpoint to hang for up to 5 seconds. To solve this, we implemented an event-driven architecture using WebSockets.

When a ticket is submitted, it is immediately saved to the database with a pending_triage status. A background Celery task is triggered to perform the AI analysis. Once complete, the backend emits a WebSocket event to the frontend, which automatically updates the UI using React Query cache invalidation.

Lessons Learned

Always decouple LLM calls: They are inherently slow and unpredictable. Never put them in the critical path of a user request.
Prompt engineering is an ongoing process: We had to implement an active learning feedback loop where agents could correct the AI's categorization, which we then used to fine-tune the prompts.

Future Improvements

In the next iteration, we plan to implement a specialized, locally hosted SLM (Small Language Model) to replace the external API calls for basic categorization, further reducing latency and operational costs.