Conclusion

The finished solutions for robot manipulation / grasping and object detection worked well individually, but we ran into some issues trying to integrate the two successfully. Our original plan was to put two AR tags on the corners of the table and then figure out where the objects are relative to the AR tags to easily map their coordinates to the world frame. We trained the network using a black surface because using a white surface caused the camera view to be too bright and overexposed. However, the AR tracker alvar package is not able to track the AR tags consistently on a black surface, especially at the far end of the camera. [Add discussion about object recognition results]. The grasping and movement of the wooden blocks with AR tags on it was fairly robust for n (# of objects) less than 5. The first two design criteria were met relatively reliably, but the last design criteria was not met as consistently as desired due to time constraints.

For the object detection network, it works very well with the same setting of training data (black board), but performs pretty bad with new setting (white board) due to lack of seeing such data. It could be resolved by training the network with objects on different background. Another issue for combining the computer vision and manipulation is the reference AR tag. As the robot use the size of the reference AR tag for measurement, it is critical the size of AR tag is accurate. Because of the inaccuracy in the size of reference AR tag, the combination of the two techniques does not work very well.

In terms of the robot grasping and movement, the main issue was Baxter's movement being slightly inaccurate and imprecise. Sometimes, the gripper would be slightly off-center from the AR tags and therefore not be able to grab the object.

Our project has a lot of room for improvement and extensions. Continuous tracking should be implemented to ensure that Baxter's gripper hones in on an object's location. The speed of Baxter's movements and inverse kinematics solver is quite slow. While actual joint velocities might be limited, one option might be to forego the inverse kinematics solver and just control the joints independently. This might improve the speed of the entire process. For simplicity, we used wooden cylinder and rectangular blocks of the same height so we could easily grasp them from above at the same height. Realistically, different objects of different shapes and sizes should be grasped differently. Object-dependent grasping should be implemented to ensure that different, rigid objects with distinct shapes can be grasped optimally. Also, we should be able to detect whether or not an object was grasped successfully based on force sensing. Path planning can be improved using simple graph traversal (Dijkstra's algorithm for instance) to minimize the distance traveled by Baxter. All of these things would help to improve the overall robustness of our system, especially for handling a greater number of objects and edge cases.