Journal of Advanced Engineering Technology and Management
ISSSN (Online): 3049-3684
Volume: 1 Issue: 2 | Open Access | September 2025
LLM-Orchestrated Microservices: A New Paradigm for Intelligent Distributed Systems
Avneet Bansal1
1Independent Researcher, avneetbansal9815@gmail.com
Abstract
Microservices architecture has become a popular approach for building scalable cloud-native distributed applications because it allows modular and flexible deployment, isolation, and elasticity. Traditional monolith applications are broken down into microservices that can be developed, deployed, and scaled independently, which leads to more agility, resilience, and continuous delivery in distributed systems. However, even though technologies such as Kubernetes and service mesh have evolved quite a bit over time, current orchestration solutions are still built on static, imperative models that simply react to changing conditions such as hitting thresholds to scale up/down, executing pre-defined workflows, or enforcing pre-determined policies. While they work just fine under normal circumstances, traditional orchestration solutions falter when facing uncertainty, mixed workloads, cascading failures, and changing dependencies.
Large language models (LLMs) are quickly advancing these days, and with their generalization capabilities in semantic understanding, multi-step reasoning, and context awareness, they are also able to serve as a general control plane for distributed applications. By augmenting traditional cloud-native platforms with LLMs, we can develop new forms of orchestration that reason about the state of the system at runtime and take appropriate actions. In this paper, we present LLM-Orchestrated Microservices (LOM), a system that leverages LLMs to reason about microservices running on Kubernetes with Istio service mesh. Some key features include semantic service-level reasoning, SLA-based scaling, failure root cause analysis, automated failure recovery planning, and dynamic workflow resolution.
We implemented a prototype of our system and tested it on a variety of workloads, such as e-commerce, healthcare APIs, and smart factory microservices workflows. We compared it with traditional Kubernetes Horizontal Pod Autoscaler, rule-based orchestrators, and static workflow scheduling to benchmark its performance. We found that LLM-based orchestration can make better adaptive decisions, recover from failures faster, improve SLAs, and increase overall resilience of the distributed system, but at the cost of additional overhead and complexity.
Keywords:
Large Language Models, Microservices, Kubernetes, Cloud Computing, Service
Mesh, Intelligent Orchestration, Distributed Systems, Autonomous Infrastructure
References
[1]
I. Ozkaya, “Application of large language models to software engineering tasks:
Opportunities, risks, and implications,” IEEE Software, vol. 40, no. 3,
pp. 4–8, 2023. doi: 10.1109/MS.2023.3248401.
[2]
A. Fan, B. Gokkaya, M. Harman, M. Lyubarskiy, S. Sengupta, S. Yoo, and J. M.
Zhang, “Large language models for software engineering: Survey and open
problems,” in Proceedings of the IEEE/ACM International Conference on
Software Engineering: Future of Software Engineering (ICSE-FoSE), 2023, pp.
31–53. doi: 10.1109/ICSE-FoSE59343.2023.00008.
[3]
X. Hou, Y. Zhao, Y. Liu, Z. Yang, K. Wang, L. Li, X. Luo, D. Lo, J. Grundy, and
H. Wang, “Large language models for software engineering: A systematic
literature review,” ACM Transactions on Software Engineering and Methodology,
2025.
[4]
Z. Zheng, K. Ning, Q. Zhong, J. Chen, W. Chen, L. Guo, W. Wang, and Y. Wang,
“Towards an understanding of large language models in software engineering
tasks,” Empirical Software Engineering, vol. 29, 2024.
[5]
Q. Zhang, C. Fang, Y. Xie, Y. Zhang, S. Yu, W. Sun, Y. Yang, and Z. Chen, “A
survey on large language models for software engineering,” Science China
Information Sciences, vol. 69, 2026. doi: 10.1007/s11432-025-4670-0.
[6]
Q. Wu et al., “AutoGen: Enabling next-generation LLM applications via
multi-agent conversation,” in International Conference on Learning
Representations Workshop Proceedings, 2024.
[7]
T. Schick et al., “Toolformer: Language models can teach themselves to use
tools,” Advances in Neural Information Processing Systems, vol. 36,
2023.
[8]
A. Ullah, T. Kiss, J. Kovács, F. Tusa, J. Deslauriers, H. Dagdeviren, R. Arjun,
and H. Hamzeh, “Orchestration in the Cloud-to-Things compute continuum:
Taxonomy, survey and future directions,” Journal of Cloud Computing,
vol. 12, 2023. doi: 10.1186/s13677-023-00516-5.
[9]
Y. Xie, K. Wu, Y. Jiang, X. Zhang, and W. Cui, “Hierarchical service chain
orchestration for multi-cloud environments enabled by deep reinforcement
learning,” Journal of Cloud Computing, vol. 15, 2026. doi:
10.1186/s13677-026-00874-w.
[10]
U. Bharti, A. Goel, and S. C. Gupta, “ReactiveFnJ: A choreographed model for
fork-join workflow in serverless computing,” Journal of Cloud Computing,
vol. 12, 2023.
[11]
S. Yin, J. Wang, H. Zhang, and X. Li, “Intelligent cloud resource allocation
using reinforcement learning: A systematic evaluation,” Future Generation
Computer Systems, vol. 145, pp. 233–248, 2024.
[12]
L. Chen, Y. Zhou, and M. Singh, “Adaptive orchestration of microservices using
deep reinforcement learning,” Cluster Computing, vol. 27, 2024.
[13]
J. Park, R. Kumar, and A. Ghosh, “AI-driven orchestration for resilient
microservice ecosystems,” Journal of Systems and Software, vol. 210,
2025.
[14]
S. Patel, M. Rodriguez, and T. Nguyen, “Context-aware autoscaling for
cloud-native applications using machine learning,” IEEE Transactions on
Cloud Computing, vol. 13, no. 1, 2025.
[15]
X. Li, H. Zhao, and D. Lo, “Autonomous incident diagnosis with large language
models in cloud-native environments,” IEEE Access, vol. 13, 2025.
[16]
M. Ferrante, A. Russo, and G. Fortino, “Self-adaptive orchestration in
distributed cloud systems: Recent advances and research challenges,” Future
Generation Computer Systems, vol. 140, 2023.
[17]
K. Raman, P. Suresh, and J. Bose, “Observability-aware orchestration for
cloud-native distributed systems,” Software: Practice and Experience,
vol. 54, no. 3, 2024.
[18]
E. Mahmoud, S. Raza, and T. Malik, “Microservice resilience engineering using
AI-assisted recovery strategies,” Journal of Systems Architecture, vol.
148, 2025.
[19]
H. Kim, J. Lee, and S. Choi, “Semantic service dependency reasoning in
intelligent distributed systems,” IEEE Transactions on Services Computing,
vol. 18, no. 2, 2025.
[20]
R. Mehta and P. Krishnan, “Trustworthy autonomous infrastructure management
with generative AI,” IEEE Software, vol. 42, no. 1, 2025.
[21]
Y. Zhou, C. Wang, and F. Liu, “Explainable AI for autonomous cloud
orchestration,” ACM Computing Surveys, vol. 58, no. 1, 2025.