Skip to Content
I'm available for work

17 mins read


A Reinforced Cognitive Framework for Micro-Robots' Autonomous Navigation

A critical research review exploring the future of micro-robots' autonomous navigation in minimally invasive surgery through reinforced cognitive frameworks.


Series: Medical Robotics Research

Episodes: (1/1)
  • A Reinforced Cognitive Framework for Micro-Robots' Autonomous Navigation
ℹ️

Research Abstract

As robotics advances into an era of embodied intelligence, powered by emerging technologies like digital simulationand deep learning, the autonomous navigation of micro-robots in minimally invasive surgery is experiencing a rapid transformation. This shift takes place as we move our focus from the external world to the internal body. Starting with a brief summary of Heunis et al.'s work[1], we will identify the key modalities of micro-robots' autonomous navigation, considering both hardware and algorithm design. Amid the evolving robotics research landscape, there lies an emerging vision: transferring reinforced cognitive micro-robots from a real-time digital simulation to the complex vascular environment, with an emphasis on surgical efficiency and safety.

To drive more practical advancements, we will explore technology-oriented details aligned with the interests of the Surgical Robotics Laboratory. We will examine tools like generative models for image preprocessing, Visual Language Models (VLMs) for cognitive development, Graph Transformers (GTs) for vascular segmentation, and Gaussian Splatting (GS) for flexible simulation. This will culminate in a discussion on collaborative swarm control. With a broad view of the current landscape and its technological components, the vision of micro-robots' autonomous navigation is no longer a distant dream, but an achievable reality.

Core Paper Summary

The internally magnetic rotablation catheter introduced by Heunis et al.[1] aims to achieve arterial lumen navigation for minimally invasive arterial stenosis treatment. It can be equipped with sufficient actuation driven by a wireless controller to debulk calcified deposits, while also meeting clinical requirements for efficiency and safety. The structure follows a cyclic pattern, as depicted in Figure 1, unfolding as follows:

The cyclic structure of the provided core paper

Figure 1: The cyclic structure of the provided core paper [1].

Dance in Chains: Considering both medical and design constraints, this paper presents a power screw unit with a spring-loaded actuator that fits into the inner side of vessels, preventing arterial damage that might be caused by overheated electromagnetic coils. It promises to combine rotational and transluminal motion, enabling efficient deposit removal through internal magnetic forces.

Prototype Design: To ground the design in a practical prototype, Heunis et al. [1] determines the position and geometry of the assembled electromagnetic coils based on the principle of maximizing the catheter tip forces in a simulated magnetic field. It also estimates the maximum screw pitch based on the plunger's stroke displacement and actuation cycle.

Simulation: By measuring the coils' resistive temperature, the tip force and torque of the rotablation catheter, and the debulking ability of the prototype in a phantom model, the research validates that this magnetic, spring-loaded rotablation catheter can debulk calcified deposits and navigate through the artery with an intrinsic magnetic field.

Based on the universal background and core aim of this foundational work, we will explore the promising future of surgical micro-robots, from the current research landscape to emerging technologies [2] [3] [4], concluding with a practical, detailed plan for future research.

Where Are We? The Big Picture

Current clinical procedures for cardiovascular diseases (CVDs) focus on removing blockages in blood vessels through clinician-operated robotically steerable catheters[5], which aim to embody inherent robotic autonomy and can be remotely controlled by a clinician[6]. We will explore the relevant research landscape from the perspective of the methodology framework, including actuation design and algorithmic layers composed of sensing and interaction, followed by an evolutionary thread of micro-robots' navigation to delve into the manifesting vision, as depicted in Figure 2.

The research landscape of micro-robots' navigation

Figure 2: The research landscape of micro-robots' navigation.

Methodology Framework

To construct a clear research landscape, we must identify the key principles of current technical methodologies, considering both hardware and software design. These principles point to the foundational concepts of micro-robots' navigation, while also focusing on the historical evolution that shapes their potential future development.

Actuation Design: Serving the purposes of efficient and safe navigation through vessels, the actuation design of robotically steerable guidewires (RSGs) naturally aims to embrace at least three degrees of freedom (DoFs): translation for guidewire advancement into the vessels, bending to navigate through the vessels, and rotation for bending plane variation in 3D navigation[7]. The competitive types of actuation designs can be classified as: 1) magnetically actuated guidewires[8], which typically rely on controlling external magnetic fields to achieve the desired motion, and 2) motorized actuation systems[9], which are more intrinsic but restricted to limited tendon-driven systems.

While these actuation strategies provide the fundamental capabilities for micro-robots' movement, manipulating these motions to achieve precise treatment is still dependent on well-designed algorithmic implementations. These implementations require an integrated understanding of the vascular environment, agent movement, and their real-time interactions.

Tracking and Control: From a control theory perspective, the algorithmic layers of micro-robots' navigation consist of two orientations: open-loop control and closed-loop control. In model-based open-loop control [10], an inverse model is used to compute and apply the required inputs for desired output motions, typically validated in simulated ex vivo vasculature and phantom models. However, considering the need for interaction with the environment—such as identifying the ideal target position of micro-robots in the vessels—emerging interest in closed-loop control[11] has formed a new trend.

This approach relies on visual feedback from fluoroscopy, ultrasound, and MRI images to localize the guidewire and plan the next motion in real-time decision-making for control. Deep learning (DL)[12] and reinforcement learning (RL)[13] have been proven to be powerful tools for understanding image data and enabling free-hand surgical manipulation through reward design, particularly within digital simulated vascular environments[14].

Revealed by the presented actuation and algorithm design framework, we can observe the evolution of micro-robots' navigation. In the early stages, manual navigation from the proximal end carried a high risk of vessel damage and surgical difficulty, especially when dealing with bifurcated vessel structures. To equip RSGs with autonomous navigation abilities, magnetic and motorized actuation strategies introduced embodied steerability as a transition stage in robot-assisted surgery.

⚠️

Key Challenge

Challenges such as radiation overexposure from fluoroscopy, the potential scarcity of skilled physicians, and possible vessel perforation and damage during treatment highlight the urgent need to develop embodied intelligent micro-robots that can navigate the complicated vessel environment autonomously.

Informed by this evolving research direction, we can envision a promising future for micro-robots' navigation, starting with the reconstruction of a high-quality simulation environment for RL controller training. This is further enhanced by integrating cognitive functions from vision-language models (VLM)[15] [13] into the navigation pipeline, enabling interactive operations through visual and verbal collaboration with skilled clinicians. Ultimately, this will improve surgical efficiency and provide a safer environment for minimally invasive surgeries.

What Can We Do? The Promising Future

To realize the intelligent and precise navigation of robotics, RL[13] [16] has been investigated as a possible solution for micro-robots' navigation in unstructured environments, especially where traditional navigation struggles with the highly dynamic and complex nature of the operational environment[17]. Given the large data requirements for validated network training, it is essential to establish a high-quality real-time digital simulation environment[14] [18], which can replicate the complex properties of the vascular environment.

This environment provides a repetitive setting for data collection while saving time on physics modeling. With the aid of emerging technologies such as sim2real[2], world models [19] [20], and digital twins[21], a technical roadmap exists for gradually designing a vivid digital simulation platform, as shown in Figure 3.

The promising future of micro-robots

Figure 3: The promising future of micro-robots.

2D Navigation: Current research primarily focuses on 2D micro-robots' navigation, limited by the nature of sensing modalities that provide 2D images as feedback. For example, recent work[13] demonstrates a typical RL controller integrating data-driven and model-based control strategies, supervised by a 2D path planner in a vascular model. However, this static phantom vascular model, sensed by 2D images, cannot provide detailed structural information needed for micro-robots to avoid potential vessel damage in 3D space during navigation.

3D Navigation: With the increasing use of digital simulations in industrial applications, advanced solutions [14] offer more sophisticated approaches for training and evaluating autonomous controllers solely on physical test benches, with camera and fluoroscopy feedback in a simulation platform-based environment. This approach enhances comparability across studies. However, improvements are still needed in areas such as robot agents' formation, actuation models, and capturing the complexity of vascular structures.

Emerging Technology

Emerging techniques like Gaussian Splatting (GS) can help reconstruct a more realistic vascular model with flexible material variations and structural details, providing additional image data from different viewpoints to improve tracking and planning performance for micro-robots.

4D Navigation: While space reconstruction has been increasingly explored, temporal factors of the vascular environment—such as blood flow speed and variations in stenosis—are still underdeveloped. Borrowing concepts from Four-dimensional Digital Subtraction Angiography (4D DSA)[22] [23], micro-robots' 4D navigation could incorporate time-varying edits of the vascular environment and agent movement, offering enhanced support for clinicians' operations or autonomous tracking and planning.

Cognitive 4D Navigation: Although ultimate autonomous navigation is a revolutionary development, it would be more adaptable if we can preserve the possibility of interactive collaboration with professional clinicians. In cases of unexpected situations, instant supervision from certified and skilled operators would ensure safety and precision. By transforming image information and verbal instructions into a unified action framework, vision-language models (VLMs) [15] [13] provide a bridge to connect the data-driven knowledge from simulated environments with clinicians' expert intelligence, improving safety and precision during micro-robots' navigation.

How Will We Do? The Devil in the Details

By exploring the previous publications from the Surgical Robotics Laboratory, we can focus on detailed improvements in several key areas. First, we can explore image preprocessing techniques that utilize generative models to improve image accuracy, such as removing artifacts from ultrasound images to enhance the performance of the navigation tracker. Second, we can focus on obtaining accurate vascular segmentation by combining Graph Neural Networks (GNN) and Vision Transformers (ViT) to localize micro-robots, leveraging advanced image feature extraction techniques that encapsulate both local and global information, as demonstrated in Figure 4.

The practical improvements in different modalities

Figure 4: The practical improvements in different modalities.

Generative Model for Preprocessing: Delving into the magnetic manipulation system, cardiovascular ultrasound imaging, or echocardiography, has become one of the most important advances in providing real-time and cost-efficient cardiac imaging. However, it may suffer from haze artifacts[24]. To help the downstream tracker recognize the relationship between the micro-robot agent and the vascular environment robustly, and to provide a reliable foundation for constructing a high-quality 3D simulation platform, we can adopt emerging generative models for preprocessing the ultrasound images[25] [26] [24]. This approach avoids the labeling costs for data preparation and improves image quality through elegant reconstruction techniques.

GNN and ViT for Segmentation: Accurate segmentation of vascular structures is crucial for constructing a precise tracking modality and a high-quality simulation model with a clear graph representation. However, a core challenge in medical image analysis remains due to the complexity of elongated, thin, and low-contrast vessels. Graph Neural Networks (GNNs) have proven to be powerful tools for segmenting vascular structures in micro-robot tracking [12], but they struggle with capturing global information.

Recent advancements have introduced various Transformer variants, collectively known as Graph Transformers (GTs), which have demonstrated performance that matches or even surpasses that of GNNs[27]. By combining GNN's inherent inductive biases for locality and scale in medical imaging with Vision Transformers (ViT)'s efficient parallel processing capability for large datasets and global information integration, we can balance local and global features. This enables the aggregation of messages from distant nodes, thereby alleviating the issues of over-smoothing and over-squashing to some extent[28].

Real-time Rendering for Simulation: To reconstruct a high-quality simulation platform as a digital twin of the real vascular environment, we should consider three main aspects: (1) fine-grained internal reconstruction with modeling of blood hemodynamics, (2) constructing digital assets for various micro-robot agent forms and actuation strategies, and (3) integrating the digital agents and vascular models into a specific medical simulator to drive the reinforcement learning (RL) controller training.

Current research has provided potential solutions for these aspects[4] [22] [23] [29]. However, there is still a lack of a mature simulation pipeline that integrates the promising real-time Gaussian Splatting (GS) rendering, such as how to reconstruct the inner structure of vessels from medical imaging, how to model different micro-robot agents with corresponding actuation designs, and how to choose an appropriate simulation platform for specific vascular structures.

Extension to Collaborative Swarm Control: Finally, expanding our discussion from a single-agent to swarm-based micro-robots, many modalities follow a similar exploration path, such as tracking and planning and control within a simulation platform construction paradigm. However, there are additional aspects[30] that we must consider, such as simulating more variations of actuation, like acoustic and light actuation, exploring alternative imaging techniques like optical coherence tomography, and implementing more complex collaborative control patterns, such as transformation and formation control.

Conclusion

Inspired by the magnetic rotablation catheter research, we have developed a reinforced cognitive framework for micro-robots' autonomous navigation. The current landscape demonstrates a clear evolution from manual operation toward embodied intelligence, with a promising future relying on a high-fidelity digital ecosystem that leverages GS for realistic vascular simulation and RL for sophisticated controller training. The ultimate vision culminates in cognitive 4D navigation, where VLMs enable intuitive collaboration between micro-robots and clinicians, merging data-driven precision with expert oversight.

Research Vision

By integrating the research direction of the Surgical Robotics Laboratory with targeted technological advancements, this vision becomes achievable through: generative models for image preprocessing, combined GNN and ViT architectures for precise segmentation, and a real-time rendering pipeline for simulation. The autonomous navigation of micro-robots is not a distant aspiration but an unfolding reality—a concerted effort to translate the reinforced cognitive framework from digital twins to the human body, ushering in a new era of minimally invasive surgery.

References

[1] Heunis, C.M., et al. (2022). Design and Evaluation of a Magnetic Rotablation Catheter for Arterial Stenosis. IEEE Transactions on Biomedical Engineering, 69(8), 2510-2521. DOI: 10.1109/TBME.2022.3151234

[2] Anderson, P., et al. (2020). Sim-to-Real Transfer for Vision-and-Language Navigation. Conference on Robot Learning, 671-681.

[3] Zhong, Y., et al. (2025). RobotronNav: A Unified Framework for Embodied Navigation. IEEE Robotics and Automation Letters, 10(2), 1234-1241.

[4] Liang, Z., et al. (2025). InnerGS: Internal Scenes Rendering with Gaussian Splatting. Computer Vision and Image Understanding, 240, 103892.

[5] Hu, C., et al. (2018). Steerable Catheters for Minimally Invasive Surgery. Annual Review of Biomedical Engineering, 20, 403-430.

[6] Killer-Oberpfalzer, M., et al. (2018). Remote Magnetic Navigation for Cardiac Interventions. Interventional Cardiology, 13(3), 117-123.

[7] Chen, X., et al. (2023). Three-Dimensional Robotic Navigation in Vascular Networks. Science Robotics, 8(75), eadt7461. DOI: 10.1126/scirobotics.adt7461

[8] Liu, M., et al. (2023). Magnetically Actuated Soft Robotics for Medical Applications. Science Advances, 9(12), eadg6438. DOI: 10.1126/sciadv.adg6438

[9] Shen, T., et al. (2019). A Novel Robotic System for Minimally Invasive Surgery. IEEE Transactions on Robotics, 35(4), 892-905.

[10] Wang, S., et al. (2022). Open-Loop Control Strategies for Micro-Robotics. Robotics and Autonomous Systems, 148, 103926.

[11] Zhang, L., et al. (2016). Closed-Loop Control of Magnetically Actuated Catheter Using 2D Ultrasound Images. IEEE Transactions on Medical Robotics and Bionics, 1(2), 85-94.

[12] Kumar, A., et al. (2023). Graph Neural Networks for Vascular Structure Tracking. Medical Image Analysis, 89, 102891.

[13] Liu, J., et al. (2025). Autonomous Guidewire Navigation in Robot-Assisted Interventions. IEEE Transactions on Medical Robotics and Bionics, 7(1), 156-167.

[14] Karstensen, L., et al. (2025). Digital Simulation Platforms for Vascular Robotics Training. Computer Methods and Programs in Biomedicine, 240, 110844.

[15] Huang, R., et al. (2025). SurgTpGs: Semantic 3D Surgical Scene Understanding. International Conference on Medical Image Computing, 234-243.

[Additional references 16-30 available in the full academic version]