Over the last decade and a half, I have been immersed in creating companies and the trials and excitement which go with them. What follows is a list of my memories, mentoring, meetings, mistakes, musings and words of wisdom I was taught or learned the hard way, about business, startups and life.
I am writing this to not forget them.
With the proliferation of startups creating interactive robots, drones, AR, etc, there is a huge need to increase the quality of integrated embedded systems, tutorials/support and people's mentality on development.
While we were selling Sticky, I wanted to get back up to speed on the state of embedded development, cameras, servos, etc so decided to create a robot "head" with the goal of having it respond to its environment, have both speech and hearing and recognize people.
To make it more interesting, I decided to have it respond with 'robot emotions' where it has a simple vocabulary it can use to emote and interact. You can see some of it in the video below showing engagement and puzzlement.
A quick overview of the robot's movement.
I was surprised how much effort still went into booting the hardware, getting the OS to recognize camera, enabling the cameras in OpenCV stably, etc. It felt very much like 1999.
Additionally, most of the forum support and tutorials stop at the "hello world" phase where they get something to wiggle or respond in a simple way but don't go further. This means that while it is simple to get a camera that you can show off to your friends, or move an actuator, the step to a functional system is an order of magnitude more difficult and people are mostly on their own.
Below are a set of reflections on the process and overall state of development.
Robot Platform Details
The physical platform is a combination of prefabbed aluminum parts and custom 3D printed ones. For the software, I wrote a web API using Tornado to control the robot and a portal in React.
Hardware Platform
Wide angle environment camera mounted on the base
Narrow eye camera
5 180-degree servos
Raspberry Pi and 16-channel servo controller
aluminum and 3D printed skeleton
Side-view of the robot showing 5 degrees of freedom, the environment camera mounted on the narrow-field camera on the top of the head.
The head of the robot can tilt and the eye can look around quickly.
Software Platform
Core: Python core system running on raspberry pi.
Vision: OpenCV for image de-warping, face detection, face recognition, background subtraction
Speech to Text: Sphinx, remote Google, etc
Text to Speech: eSpeak
Web API: Tornado web server and WebSocket. (Docs are here)
Remote Web Portal: React portal hosted on GITHub here.
Repository: on GITHub.
RaspberryPi with a 16-channel servo controller shield using the PiCamera and a USB camera input.
One of the things that became clear very early on was that running the system on embedded hardware, I needed a remote interface and remote processing to interact with the system rather than trying to do everything over SSH or with the local desktop on the Pi. I searched around for a good solution to this which was pre-made and did not find one so I designed an API using Tornado and wrote the web client in React.
The end result is that there is a statically hosted portal with security that can log in to the robot and control everything. It supports multiple simultaneous connections, reconnect in a drop in connectivity and introspects the robot using the STATE message to allow for dynamically adding, removing and changing dimensions of control without having to write any more code.
The React web interface for the robot returns the real-time data for each dimension of the robot, multiple cameras and logging over a WebRTC connection.
Getting to the point of having stable logging and interaction with the robot took a signifiant amount of time, but turned out to be very worth it as it both made sure that modules were well separated and also enables off-robot processing for more complex elements with easy debugging.
Challenges in Developing a Robot from the Ground Up
Your first hundred hours feel wasted.
In setting things up, I was surprised how primitive the support still is for integrating sensors, embedded hardware, drivers, etc. Coming back from a AWS environment for the last few years where the expected quality of infrastructure usually includes things like "designed for 9 9's", the current robot, computer vision and other system stability feels like it has at best one 9 out of the box.
This led to 50+ hours of finding different versions of libraries, having to adjust the bios, compiling OpenCV 3, re-linking to/from OpenCV 2, being mis-led by old documentation or tutorials and then deciding I needed to build both a web server and remote access to the system to debug it fully.
1. Choosing the best programming language and architecture is unclear
For this project, I chose the raspberry pi as the base platform. This led to an easy solution of using Python as the programming language, however the question then arrises for how to do library installs (Here is my install script which gets a new system 90% of the way there), how to deal with message passing, time-sensitive updates and many other things.
There is a system called the Robot Operating System (ROS) which is used by many people, however it is poorly supported, needs patching for actual production work and was designed many years ago so doesn't take advantage of the cloud, WebSockets, WebRTC, language updates and many other things. Additionally, talking to corporate teams who are using it currently in production, their respect for it is not very high. Trying to get it running and working through the forums, this time, I decided to write my own core system for this robot.
If I were to do it again and had a specific task with a larger team, I would definitely review ROS more, however it honestly didn't feel worth it. Also, it is just basically a message passing system and not an "OS" so you still have to write everything else around it.
2. Finding stable builds of libraries is hit or miss
While there is apt-get, pip and lots of libraries on them, most are out of date, only partial builds (No extended libraries) or created by an individual who used it for a specific task and didn't test for stability across use cases.
OpenCV 3 bindings and build in python turned out to be quite a challenge until I found one person who had created a build which was stable (But hard to find).
Logging Systems were created for Servers, not Robots
Working on a robot needs a very different perspective than working on servers in the cloud or on a local computer. Hard crashes, physical actuator issues, camera illumination changes, etc can't be summed up in simple log messages.
One example I ran into is that even though the system was not resource constrained, when I enabled face detection and recognition, it made the update loop for the servo positions less stable which resulted in instabilities in movement and significant jerks. For initial debugging, I just thought that the servo hardware was not as good as I had expected and then through trial and error, I disabled all vision processing and the robot returned to smooth movement. Ideally, there should be an easier way to assess this.
Therefore, a different type of logging mentality is needed. Something more like this:
Real-time telemetry and replay of the dimensions of the robot
Efficient video image upload (Based on bandwidth)
Code logging of key elements - User defined
System logging for the embedded hardware - CPU, etc
Event triggers for 'in cloud' elements of the system
'Black Box' hard failure debugging when everything goes wrong
Systems like Loggly, Splunk, etc are just not built with this mentality and can only partially cover teh user logging and system logging.
Instead, a control panel would be massively helpful for debugging and understanding the state of the robot.
That is why I had to write the API and web portal so that I wasn't hacking the system apart every time I wanted to see an output while developing on it.
Hardware / software Interfaces are not Documented
In using the servo controller board, the drivers are a simple PWM interface where you set frequency and duty cycle. What is not documented anywhere is that they use a <1960's protocol where rather than using 100% of the duty cycle to give very high resolution, you have to keep the frequency at 50hz and the duty cycle between 0.5 and 2.5 msecs where 1.5msecs centers the servo. Here is a rant on it from when I sorted it all out. I was surprised how lacking the documentation both on the drivers and across the internet was to make it work and may have blown up a servo in the process of trying to sort it out...
The same thing applies to the cameras and sound. It takes significant fiddling in the bios and other areas to stably enable the cameras where their order doesn't swap, they appear both inside the system AND python and provide stable images.
3. 99% of Tutorials / Forum Answers Stop at "Hello World for X"
For normal development, Stack Overflow and other places have production level answers to challenging questions which are posted by people creating high quality implementations. For robotics, it seems that most people who are competent at the systems are not posting answers, so small things can get a new developer hung up simply because they can't find the documentation or work out an edge case.
It feels like there are just not enough people out there collaboratively using/designing these types of systems to have coverage of all the edge cases that people will hit.
We need something better. Now.
After getting to this stage, my gut feel is that the world is ready for a next generation of development tools, support and infrastructure because most of the time I spent getting the system ready was solving standard issues which were solved for servers many years ago.