A Review of Synthetic Data in Autonomous Vehicles


Synthetic data is already widely used by autonomous vehicle companies. Think about it: in many cases, simulations are critical to validating new driverless technologies without endangering real people. But the simulation systems are not the same across companies—in many cases, companies have developed their own proprietary systems. We’ll review them below.

Leading Companies

  • Alphabet’s Waymo
  • GM Cruise
  • Tesla Autopilot
  • Argo AI
  • Aurora

Alphabet’s Waymo

Carcraft is Waymo’s simulation platform. According to a 2017 article in The Atlantic, 25,000 virtual autonomous cars are driving 24 hours, 7 days a week through virtual versions of Austin, Phoenix, and Mountain View, USA.

The simulation platform allowed Waymo to log 2.5 synthetic billion miles in 2016, letting the company test high-risk traffic stops, car crashes, and obstacle avoidance without human casualty.

Source: Wikimedia

Waymo’s partnership with Fiat Chrysler means that multispectral imaging also plays a big role in simulation. Waymo uses Chrysler Pacifica automobiles equipped with LIDAR, radar, and camera sensors to detect and identify objects in the surrounding environment. In many cases, this LIDAR and radar data can be quite challenging to manually label because humans have trouble interpreting these images. Simulations and virtual environments offer a way to make-use of multispectral data in a way that can be beneficial for autonomous driving efforts.

GM Cruise

Source: Wikimedia

GM Cruise, the autonomous driving subsidiary of General Motors, uses a simulation platform called The Matrix. The simulation platform, built with Unreal Engine, allows GM Cruise to drive 200,000 virtual miles per day.

The simulation platform functions similar to Waymo’s Carcraft but contains full-3D art which helps GM Cruise generate massive synthetic training sets to train object detection deep learning models. Each virtual car is reported to generate 300 terabytes of synthetic data.

To power this massive simulation, GM Cruise relies on Google Cloud Platform. According to Adrian Macneil, the director of engineering at GM Cruise, their simulation platform uses 5,000 GPUs and 300,000 processor cores daily. This is an extreme use case of synthetic data: The Matrix must handle LIDAR, radar, and video camera data from 30,000 virtual cars, paving the way for autonomous vehicle adoption and public acceptance.

Tesla Autopilot

Unlike its competitors, Tesla keeps a relatively low profile regarding their simulation technology.

Tesla is notable among autonomous vehicle companies in that Tesla relies largely on video cameras and eschews LIDAR sensing. This means that Tesla’s simulation system is highly focused on photorealism, which is essential for video-based deep learning.

Source: Wikimedia

One of the most interesting things about Tesla is its use of fleet data: by April 2020, Tesla owners have driven over three billion miles. Tesla gathers data from vehicle owners and uses that data to help train machine learning algorithms. For example, Tesla can compare a car’s projected path of motion (as if it were on Autopilot) against the real driver’s actions, and then use the results to improve their self-driving algorithms. Some observers suggest that Tesla’s fleet data is a significant competitive advantage.

Argo AI (backed by Ford)

Source: Flickr

Argo AI also relies on synthetic data: in May 2020, the startup simulated one million miles every night. In 2017, Ford invested $1 billion into Argo AI, enabling the company to ramp up its simulation operations.

Not a lot is known about Argo AI’s simulation efforts; however, Ford’s recent acquisitions provide some insight into their technological strategy.

In 2017, Ford acquired Quantum Signal AI, a startup that created synthetic environments for self-driving cars. Quantum Signal AI’s 3D environments were not photorealistic, but they demonstrated that synthetic training data is crucial to helping AI/ML algorithms solve unexpected edge cases.

More recently, Ford was a participant at Nvidia’s GPU Technology Conference (GTC), where speakers addressed generating diverse, photorealistic synthetic training data. This suggests that Ford is actively working on synthetic data R&D alongside Argo AI, most likely for autonomous vehicles.

Aurora (backed by Amazon and Sequoia)

Aurora is one of the “unicorn” autonomous vehicle startups, backed by Amazon and Sequoia. The startup leans heavily on synthetic data, as evidenced by their August 2019 blog post on Medium.

What’s unique about Aurora’s approach is that they built their simulation, named offline executor, in-house from scratch. The company says that its simulator software focuses on short, specific situations, such as making an unprotected left-hand turn.

Source: Flickr

Like all of the leading autonomous vehicle companies, Aurora has a system for fusing synthetic data with real-world testing. Aurora leverages a constant cycle of simulation and real driving to help fine-tune their computer vision algorithms. For example, Aurora’s simulation system has already practiced driver ‘nudging’ 20 million times, which has helped Aurora apply simulated maneuvers to real-world applications.

Looking Ahead

There are three major trends which suggest that simulations will become more capable over time:

  • Computing power is becoming more affordable.
  • Rendering technology is increasingly photorealistic.
  • Sensors are improving in resolution.
Autonomous vehicles benefit from these trends. More computing power means more complex simulations. Advancements in rendering technology result in more realistic training data. And better sensors (LiDAR, Infrared, Visible, and Radar) ensures more opportunity to apply these technological trends to automotive.

The automotive industry will lead the way when it comes to these technologies, but they will not be the only adopters. There are many use-cases across the economy—some forecasted and others unforeseen—that will radically disrupt the way of doing business. Self-driving cars are just the start: there are a lot more ways that simulations, synthetic data, and sensors will improve our life.


There is a lot of money being put into synthetic data by Fortune 500 companies and Venture Capitalists alike. Autonomous vehicles are certainly the cutting edge of AI in many respects, but there are many emerging opportunities on the horizon. These early investments in simulations demonstrate a belief that synthetic data will be the game-changing technology of this decade.

Looking for Synthetic Data?

If your company is looking for synthetic data, fill out the contact form in the top right of the menu bar. At Simerse, we can create synthetic data for a wide variety of applications, including automotive industry uses.