Last updated October 27, 2018 at 10:02 pm
Researcher exploring the challenges of classification in AI.
Credit: SINC
We find it as easy to pick a sparrow from an owl as we do to point out the trees in a painting, whether it is a work by Picasso or the more realistic Constable. Robots, though, at least until recently, have struggled to tell a bird from a cat and, as for questions of art, are still a long, long way from becoming critics.
These sorts of conundrums are what consume the attention of the Australian Centre for Robotic Vision’s Gustavo Carneiro. It means that, in contrast to the usually specialised and narrowly focused nature of scientific research, he works in a remarkably wide range of fields.
In recent years, Carneiro, a computer scientist at the University of Adelaide, has collaborated in studies as diverse as the anatomy of the heart, structure of cervical cells, performance of pedestrian detectors and artistic image analysis.
From his perspective, however, these different areas are opportunities to refine a single avenue of investigation. “It’s all about machine learning and computer vision,” he says.
Carneiro is concerned with the challenges of classification in robotics. Key to this is the task of semantic representation: the method by which objects and processes in the material world are described in formal language, such that an artificial intelligence (AI) system can recognise them and combine them in useful ways.
Defining characteristics
He likes to use the example of birds to demonstrate the challenges. The apparently simple question, “what is a bird?”, is anything but. Representing the defining characteristics of all birds – feathers, beaks, wings and so on – for AI systems is itself laborious, but it is also nowhere near enough.
The semantics must also describe the different states of these characteristics – for instance, beaks open and closed or wings flapping against a background of trees. And, most importantly, the classifications should enable the AI system to infer and to add new information to its database.
It is no good, Carneiro says, having a robot that recognises 1000 species of birds if it crashes when it encounters species 1001.
“It’s basically unsustainable,” he explains. “You would probably have to put a new node in your model and retrain the whole thing. Let’s say in the future, new things happen. Then you must retrain it again.
“The training would become more and more complicated as you move on: you have 1002, then 1003, and, who knows, maybe you reach 100,000 things you have to recognise. It will never be enough, because there will always be new things to include.”
Retraining AI systems
In the real world, he adds, retraining AI systems takes time, money and a lot of computing power. So it’s better, surely, if classification systems can be designed to adapt and incorporate new data without outside intervention.
Hence, the need to work on machine learning challenges in many different fields.
“The thing that I like is when I talk to people from different fields is it gives you new perspectives about solving the problems,” Carneiro says.
“For example, when you compare art images with medical images, the nature of the imaging process is very different; therefore the distributions are very different and the assumptions you have to make about how we’re going to solve the problems are also very different.”
He uses the example of setting up AI to look for a dog in a photograph. Context is a valuable factor. If there is grass in the image, there is a higher probability of it also including a dog than if the image shows an aeroplane. “You don’t program that explicitly,” he says. “Your model will learn that.”
He talks, too, about programming AI to look for different things in different environments. A set of medical images of, say, the human digestive system will all be substantially the same. The classification system needs to be optimised to spot the small differences – outliers – that could indicate illness.
A question of perception
With art, however, the pictures are wildly more varied. We humans can look at a set of painted trees – by Picasso, Van Gogh, Constable and a five-year-old, for example – and instantly recognise them all as representations of trees.
“Perceptually, they are all the same,” Carneiro says. But Robots don’t have perception.
The work of researchers in semantic representation and classification aims to formulate a set of markers broad and flexible enough to allow one to recognise a tree painted by a never-before-encountered artist – but restricted enough that it can also tell that Picasso’s Weeping Woman does not represent a eucalypt.
“That’s a very, very hard problem in artistic image analysis, just to recognise stuff and have this high-level interpretation of the whole scene,” Carneiro explains.
“This is what drives me towards developing new ideas and approaches. Talking to artists and then talking to doctors gives different perspectives to the problems.
“The main techniques are always machine learning and computer vision, but to solve different problems I have to make different assumptions – and those lead to different models. And that’s where the novelty comes.”
This story was first published by the Australian Centre for Robotic Vision and is re-published with permission.