🤖 AI Agent15 items
Latest AI agent research papers, analysis, and improvement insights
AgentHER: Hindsight Experience Replay for LLM Agent Trajectory Relabeling
AgentHER: Hindsight Experience Replay for LLM Agent Trajectory Relabeling
Agent Audit: A Security Analysis System for LLM Agent Applications
Agent Audit: A Security Analysis System for LLM Agent Applications
DiscoUQ: Structured Disagreement Analysis for Uncertainty Quantification in LLM Agent Ensembles
DiscoUQ: Structured Disagreement Analysis for Uncertainty Quantification in LLM Agent Ensembles
A Framework for Formalizing LLM Agent Security
A Framework for Formalizing LLM Agent Security
AdaRubric: Task-Adaptive Rubrics for LLM Agent Evaluation
AdaRubric: Task-Adaptive Rubrics for LLM Agent Evaluation
WirelessBench: A Tolerance-Aware LLM Agent Benchmark for Wireless Network Intelligence
WirelessBench: A Tolerance-Aware LLM Agent Benchmark for Wireless Network Intelligence
AgentFoX: LLM Agent-Guided Fusion with eXplainability for AI-Generated Image Detection
AgentFoX: LLM Agent-Guided Fusion with eXplainability for AI-Generated Image Detection
DALI: LLM-Agent Enhanced Dual-Stream Adaptive Leadership Identification for Group Recommendations
DALI: LLM-Agent Enhanced Dual-Stream Adaptive Leadership Identification for Group Recommendations
Solver-Aided Verification of Policy Compliance in Tool-Augmented LLM Agents
Solver-Aided Verification of Policy Compliance in Tool-Augmented LLM Agents
Enhancing User Resilience Against AI-Augmented Phishing
Enhancing User Resilience Against AI-Augmented Phishing
kRAIG: A Natural Language-Driven Agent for Automated DataOps Pipeline Generation
kRAIG: A Natural Language-Driven Agent for Automated DataOps Pipeline Generation
SoK: The Attack Surface of Agentic AI -- Tools, and Autonomy
SoK: The Attack Surface of Agentic AI -- Tools, and Autonomy
Retrieval-augmented LLM with structured sampling for Building Management Systems point tagging under minimal context
Retrieval-augmented LLM with structured sampling for Building Management Systems point tagging under minimal context
Introducing the OpenAI Safety Bug Bounty program
Introducing the OpenAI Safety Bug Bounty program
Sashiko: An agentic Linux kernel code review system
Sashiko: An agentic Linux kernel code review system