Platform Overview

KVS_AI_GUARD is an intelligent operations and maintenance (O&M) management system that leverages large language models (LLMs) to automate network operations. It can automatically monitor the status of network devices, execute configuration management, analyze logs, and troubleshoot issues. With minimal user input, complex network management tasks can be completed—boosting efficiency, reducing manual intervention, accelerating issue resolution, and ensuring system stability.

img

Figure/KVS_AI_GUARD Platform Architecture

KVS_AI_GUARD Key Features

LLM (Large Language Model)

Core Role: Serves as the system's input hub, processing user instructions (e.g., queries, configuration requests) and generating corresponding operational commands such as status monitoring, configuration updates, or troubleshooting.

RAG (Retrieval-Augmented Generation)
  • Function: Enhances LLM outputs by retrieving data from local or external sources to supplement responses.
  • Local Knowledge (Short-Term Memory): Connects to an internal knowledge base to retrieve contextual records and provide instant feedback.
Function Call – Execution Core

Function: Executes system-level operations. LLMs interact with this module to send operational commands to devices.

Integrated Actions:

  • Status: Queries the real-time status of switches or Meraki devices (e.g., connection health, bandwidth, error rate).
  • Configuration: Issues update commands to modify network settings or enable features.
  • Log Analysis: Triggers log parsing to extract insights on system faults or performance anomalies.
Troubleshooting – Independent Module

Function: Executes automated repair tasks based on results from the log analysis module.
Relation to Log Analysis: Depends on diagnostics to identify specific device or performance issues, then takes appropriate action.

  • Capabilities: Automatically executes repairs (e.g., rebooting, reconfiguring). Generates detailed incident reports if automation fails.
  • Workflow: Upon fault detection, LLM triggers log analysis via function calls. Based on analysis output, the troubleshooting module autonomously performs remediation with minimal LLM involvement.

Technical Advantages

  • Automated Operations: Automates device monitoring, configuration, and fault remediation—reducing manual workload and labor costs.
  • Rapid Fault Resolution: Quickly identifies and resolves issues through log analysis and autonomous troubleshooting—boosting system stability.
  • Intelligent Decision Support: Proactively detects risks using data retrieval and analysis, helping O&M teams optimize maintenance workflows.
  • Lower Operational Costs: Minimizes reliance on highly specialized O&M staff through automation and intelligence.
  • Real-Time Monitoring and Transparency: Users can view real-time device status, configurations, and log insights—ensuring operational transparency and efficiency.

Typical Application Scenarios

  • Large Enterprises: Managing a vast number of IT assets with limited O&M staff.
  • High-Tech Enterprises: Requiring cutting-edge tools to support advanced IT operations.
  • Digitally Transforming Organizations: Institutions like universities or data-driven businesses that need smart O&M infrastructure for digital transformation.

Service Models

  • Standard Platform: Lightweight, ready-to-deploy solution.
  • Custom Development: Deep adaptation tailored to business requirements.
  • Joint Operations: Integrated delivery of platform and O&M services.