Hidden Challenges of Deploying Robots in the Real World
Part 4: DevOps and Software Development Practices
There’s a world of difference between getting a demo to work and building a robotic system that operates reliably multiple hours a day, every day, for years. In robotics, early-stage prototypes and research systems often prioritize quick development and visible results — an approach that mirrors top-down development. This method works well for small projects, demos, and simple use cases developed by one or two researchers.
However, for robots to function at scale in real-world environments — where they need to be adaptable, scalable, reliable, and supported by teams of software engineers — robot software development must follow a bottom-up approach. This involves careful vertical integration, robust software infrastructure, and the development of critical DevOps practices. While bottom-up development takes longer, it is the only way to build systems that can withstand the challenges of deployment at scale.
In this installment, I’ll share the DevOps tools, frameworks, and practices we’ve adopted at Akara to enable scalable, maintainable, and reliable robotic software development. Transitioning from research prototypes to production-ready robots requires a shift in software practices. Research code, while functional for demos, often lacks the reliability needed for real-world deployments. Developing robust, embedded systems, implementing over-the-air updates, and managing contributions from multiple developers require significant investment in DevOps and build systems.
Software Development Frameworks
One of the first lessons when transitioning from research to industry is the need for embedded and headless operating systems rather than general-purpose desktop operating systems. Research prototypes often rely on systems that prioritize ease of use, but in production environments, reliability, efficiency, and safety take precedence.
While tools like the Robot Operating System (ROS) have become popular in robotics research, they are not inherently suitable for production deployment. ROS and other open-source frameworks provide excellent functionality, but they often lack the stability, security, and long-term reliability required in commercial systems. At Akara, like most companies professionally deploying robots, we’ve developed a hybrid codebase that combines select components of the ROS stack with custom, internally developed modules optimized for our specific needs.
To further enhance software reliability and ensure compliance, we developed a Software Bill of Materials (SBOM). This catalog lists all software packages in use, including their licenses and dependencies. Open-source components, in particular, are carefully reviewed to confirm that their licenses allow for commercial deployment and that they are patched and secure.
Software Development Practices
Managing robotic systems requires formally organizing software development. Robots run tens of thousands of lines of code across multiple layers — from embedded systems and middleware to application software and interfaces for peripherals like mobile apps. These codebases must be carefully managed and updated to maintain functionality and reliability at scale.
Here are some of the practices we use to streamline development:
Version Control: A centrally managed repository serves as the foundation for all software development. This ensures that concurrent changes made by developers are carefully tracked, reviewed, and integrated. Our repositories encompasses embedded systems software, robot operating software and middleware, application software including site-specific configurations, web-ops software, mobile app software for devices like iOS and Android.
Containerization: With robots running large and complex software stacks, containers provide a lightweight, portable way to package, deploy, and run applications and their dependencies. Containers allow us to isolate processes and services on the robot, and implement security best practices, such as the principle of least privilege and move away from running source code directly on devices.
Ticketing Systems: Formal processes for identifying, prioritizing, and resolving tasks are critical for managing software development. We also use ticketing systems to track bugs as they emerge, prioritize fixes based on their impact, assign issues to software engineers and provide clear communication among engineering teams regarding all aspects of robot software development.
Continuous Integration and Continuous Deployment (CI/CD)
Like smartphones, robots require regular updates to remain functional, secure, and efficient. However, robotic systems are far more complex due to their integration of sensors, actuators, and complex subsystems. Managing updates for multiple software components — while ensuring that every change works reliably — requires robust frameworks that enable multiple engineers to develop, test, and deploy code seamlessly.
At Akara, our CI/CD pipeline plays a critical role in supporting collaborative development and ensuring the stability of our robotic systems:
Continuous Integration (CI): CI automatically merges and tests code changes from multiple contributors into a shared repository. This ensures that the work of many engineers — across navigation, perception, hardware interfaces, and supporting applications — is continuously integrated and validated, preventing conflicts and errors from accumulating.
Continuous Deployment (CD): CD automates the deployment of validated updates to robots, enabling over-the-air updates. This allows us to deliver remote bug fixes, performance improvements, and new features quickly. This is a critical feature that allows us to maximize device uptime and reduce operational costs as updates can be made without Akara staff needing to be on-site.
Logging and Error Handling
Robots must be capable of self-diagnosing issues and responding to unexpected events. Effective logging and error handling are critical for identifying and resolving problems, especially in production deployments.
Logging: System states, warnings, and error messages are logged in real time, providing a clear picture of what the robot was doing when an issue occurred. This is essential for debugging and performance monitoring.
Error Handling: Protocols are in place to manage failures gracefully. For example, if a critical sensor stops working, the robot may safely power down or enter a standby state. Or if a non-critical error arises, the system can issue notifications to alert engineers or hospital staff.
Conclusion
Building robotic systems that operate reliably at scale requires robust DevOps infrastructure, scalable software frameworks, and carefully managed development practices. Building this infrastructure requires deep expertise. For Akara, these systems have been built and refined over years, dating back to 2017 when we first started on our journey as robotics research working at Trinity College Dublin.
In the next installment of this series, I’ll discuss how we approach workflow integration, ensuring that our robotic systems seamlessly integrate into existing processes and environments to maximize their operational impact.