Skip to main content

ChatGPT lacks spatial sense

04 March 2024

ChatGPT and other language models' attempts to analyze maritime traffic situations at sea doesn’t end well. They simply lack the ability to engage in spatial reasoning. However, this might change, the researchers behind a new Lighthouse preliminary study believe.

Exactly a year ago, the research project COLREG2 examined how well AI-based decision support systems developed for shipping worked in reality. It was known that they could handle simple traffic situations in open waters, but how would they fare in more complex scenarios, such as when multiple vessels encounter each other in coastal waters, each with different settings in their decision support systems?

"Ship captains have different preferences and different ways of resolving situations, which can be compared to algorithms with different settings. When we compared in that way, it became clear that human captains act according to a pattern that still looks very organized, whereas the algorithms became pure spaghetti. I didn't expect the results to be so messy," said Reto Weber, a lecturer in technology at Chalmers University, who led the project during when the report was publicated.

So the algorithms didn't stand a chance against human commanders. And that wasn't surprising. Including all the factors that affect human decision-making in traffic situations – experience, flexibility, and seamanship – in artificial intelligence will require machine learning, more advanced neural networks, and an enormous amount of data, the researchers wrote in the report.

But shortly after publication, it turned out that new language models like ChatGPT have an ability to understand and to some extent reason about complex texts and tasks – something that addresses the shortcomings of the algorithms evaluated in the COLREG2 project. So could the use of large language models in maritime decision support systems possibly work better? This question has been explored in the preliminary study COLREG3 – Exploring the potential of large language models in marine navigation systems.

"In November, it became possible to use images with ChatGPT. So we made simple diagrams with traffic situations where vessels were illustrated with triangles to see if ChatGPT could interpret them correctly. The result clearly showed that it didn't handle it particularly well," says Luis Sanchez-Heres at RISE, who led the project.

But there are other advanced language models, thought the researchers, and tested several others.

"We were quite surprised, but most large language models are quite bad at spatial reasoning. We ran a model with several questions where they were asked to choose starboard or port. They only answered correctly in 60 percent of the cases."

But that doesn't mean that hope for incorporating large language models into decision support systems for maritime traffic situations in the future has been abandoned.

"Our plan is to continue running our tests on the language models regularly. It only takes ten minutes. Because sooner or later, the language models will become good at this. The development is going crazy fast. When we started the project, we could only get answers in text, six months later, OpenAI can create videos. So who knows what it will look like in another six months?" says Luis Sanchez-Heres.

The report COLREG3 – Exploring the potential of large language models in marine navigation systems has been authored by: Luis Sanchez-Heres, RISE Reto Weber, Chalmers Fredrik Ahlgren, Linnaeus University Fredrik Olsson, RISE Oxana Lundström, Linnaeus University

In collaboration with: Carl Petersson, Zeabuz Tobias Husberg, Cstrider


Dela på