Data Collection API

CARLA-based multi-modal sensor data pipeline for the LANCER project carla 0.9.14+ Python 3.8+

The carla-data-api module automates the collection of synchronized sensor data from a simulated urban environment. A Dodge Charger 2020 ego vehicle (EV) drives a fixed route while RGB, depth, semantic segmentation, and LiDAR sensors capture the scene simultaneously from three horizontal angles. The process is repeated automatically for every weather preset in WEATHER_PRESETS.csv, producing a structured dataset suitable for training perception and navigation models.

CARLA Server
Spawn EV & NPCs
Attach Sensors
Drive Route
Capture Frames
dataset/

The pipeline iterates over all defined weather conditions, resetting the world between each run so that traffic and lighting conditions change consistently across the dataset.

Output Format

Each captured frame is saved as a single 960 × 960 px PNG containing a 4-row × 3-column sensor grid. Rows correspond to sensor modalities; columns correspond to the three horizontal viewing angles (left −120°, forward 0°, right +120°). Each cell is 240 × 320 px.

Grid Layout

Left (−120°)
Front (0°)
Right (+120°)
Row 0 · RGB
RGB
RGB
RGB
Row 1 · Depth
Depth (log)
Depth (log)
Depth (log)
Row 2 · Semantic
Semantic Seg.
Semantic Seg.
Semantic Seg.
Row 3 · LiDAR
LiDAR projection — spans all three columns (960 px wide)

The LiDAR row is produced by projecting the single roof-mounted point cloud onto each of the three camera planes using a pinhole camera model. Point brightness encodes distance logarithmically: bright pixels are close, dark pixels are far.

Frames are written to:

dataset/
└── <weather_preset_name>/
    ├── frame_0.png
    ├── frame_1.png
    └── ...

Sample Frames

The following samples were captured in clearday and clearnight conditions from the sample_dataset/ directory.

Sample frame — clear day, frame 0

Clear Day · frame_0

Sample frame — clear day, frame 1

Clear Day · frame_1

Sample frame — clear night, frame 0

Clear Night · frame_0

Sample frame — clear night, frame 1

Clear Night · frame_1

Requirements

DependencyVersionNotes
CARLA Simulator0.9.14+Server must be running before the script is invoked
Python3.8+
carla (Python API)Matches server versionInstall from your CARLA distribution; see CARLA docs
numpyany
opencv-pythonanyImported as cv2
scikit-imageanyImported as skimage

Install the Python dependencies with:

pip install numpy opencv-python scikit-image

Note

The carla Python package is not on PyPI. It must be installed from your local CARLA installation directory. Refer to the CARLA Quick Start guide for instructions.

Usage

Quick Start

  1. Start the CARLA server

    Launch the CARLA simulator. The server must be listening on localhost:2000 (the default port) before running the script.

    # Linux
    ./CarlaUE4.sh
    
    # Windows
    CarlaUE4.exe
  2. Navigate to the API directory

    cd carla-data-api
  3. Run the collection script

    python collect_data.py

    This executes generate_dataset(max_images_per_weather=2, interval=20), collecting 2 frames per weather preset with 20 simulation ticks (~2 s) between captures, across the first 6 weather presets in WEATHER_PRESETS.csv. Output is written to dataset/.

Custom Collection Parameters

Call generate_dataset with different arguments to control the volume and timing of captured data:

from collect_data import generate_dataset

generate_dataset(
    max_images_per_weather=10,   # frames to capture per weather condition
    interval=10                  # simulation ticks between captures (~1 s each)
)
ParameterDefaultDescription
max_images_per_weather2Maximum frames saved per weather preset. Collection may stop earlier if the EV reaches the end of the route.
interval20Number of simulation ticks between consecutive captures. At 10 Hz this equals interval × 0.1 s.

Single-Weather Run

For finer control, call run_simulation directly to collect data for one weather condition without cycling through the full preset list:

import carla
from collect_data import Simulation, run_simulation

client = carla.Client('localhost', 2000)
client.set_timeout(30.0)
sim = Simulation(client)

run_simulation(
    sim=sim,
    weather={
        'sun_altitude_angle': 60.0,
        'cloudiness': 20.0,
        # ... other WeatherParameters fields
    },
    max_images=5,
    interval=15,
    save_path_prefix='my_dataset/clearday/frame'
)

Configuration

Weather Presets (WEATHER_PRESETS.csv)

Each row defines a named weather condition. The name column is used as the output subdirectory name. All other columns map directly to carla.WeatherParameters attributes.

ColumnRangeDescription
namePreset identifier; used as the output directory name (e.g. clearday, heavyfog)
cloudiness0 – 100Sky cloud coverage
fog_density0 – 100Fog thickness
fog_distancemetresDistance at which fog starts
fog_falloffFog density decay with altitude
wetness0 – 100Road surface wetness
precipitation_deposits0 – 100Standing water / puddles on the road
precipitation0 – 100Rainfall intensity
wind_intensity0 – 100Wind speed
sun_altitude_angle−90° – 90°Sun elevation; negative values place the sun below the horizon (night)
sun_azimuth_angle0° – 360°Sun compass direction
dust_storm0 – 100Dust / sand storm intensity
scattering_intensityCombined Mie + Rayleigh scattering
rayleigh_scattering_scaleBlue-sky scattering coefficient
mie_scattering_scaleHaze / fog scattering coefficient

Vehicle Probabilities (VEHICLE_PROBABILITIES.csv)

Controls the NPC traffic composition. The global category mix is fixed in the code; the CSV controls how individual models are distributed within each category.

ColumnDescription
blueprint_idCARLA blueprint identifier, e.g. vehicle.nissan.micra
category2W (two-wheelers), 4WP (passenger), 4WU (utility), 4WS (sports)
category_probabilityFractional share of this model within its category (values per category must sum to ≤ 1)

Global category spawn mix (defined in Simulation.__init__):

CategoryShare
Two-wheelers (2W)30 %
Passenger cars (4WP)50 %
Utility vehicles (4WU)12 %
Sports cars (4WS)8 %

EV Route

The ego vehicle follows a fixed sequence of map spawn-point indices defined in Simulation.get_spawn_points_and_drive_indices(). To change the route, edit the drive_indices list in that method. Collection stops early for a given weather run if the EV comes within 5 m of the penultimate route waypoint.

Sensor Setup

All sensors are mounted on the EV roof at a height of 2 × bounding_box.extent.z + 1 m. The LiDAR sensor faces forward; the three camera sets share the same mount point but are rotated to their respective yaw angles.

SensorBlueprintResolution / ConfigNotes
LiDAR sensor.lidar.ray_cast 64 channels, 500 k pts/s, 100 m range, ±20° vertical FoV Single sensor; projected onto all three camera planes in post-processing
RGB camera (×3) sensor.camera.rgb 1280 × 960 (4× oversample), 120° FoV; downsampled to 320 × 240 Post-processing enabled (lens flare, motion blur, chromatic aberration)
Depth camera (×3) sensor.camera.depth 320 × 240, 120° FoV Log-depth encoding applied via CARLA colour converter
Semantic segmentation (×3) sensor.camera.semantic_segmentation 320 × 240, 120° FoV Class label stored in the red channel; CityScapes colour palette

Collection Pipeline

The script runs in synchronous mode at a fixed 10 Hz (0.1 s/tick). The Traffic Manager is bound to port 8000.

  1. WeatherIterator loads all presets from WEATHER_PRESETS.csv and yields them one by one.
  2. For each preset, run_simulation reloads the CARLA world, applies the weather parameters, and spawns NPC traffic (up to 200 pedestrians and a configurable number of vehicles sampled from VEHICLE_PROBABILITIES.csv).
  3. The ego vehicle is spawned, teleported to the route start, and handed to the Traffic Manager autopilot with lane-changing disabled.
  4. Sensors are attached and their callbacks push raw data into a thread-safe Queue.
  5. collect_data ticks the simulation, drains the queue after each tick, and assembles the 4-row × 3-column composite image every interval ticks.
  6. Frames are saved to dataset/<preset_name>/frame_N.png until max_images_per_weather is reached or the EV nears the end of the route.
  7. All sensors and NPC actors are destroyed before the next weather preset begins.

Notes

Synchronous mode

The script drives the simulation clock manually. Do not run other clients that call world.tick() simultaneously, as this will desynchronize the sensor queue.

Traffic Manager port

The Traffic Manager is bound to port 8000. Make sure this port is free before starting the script. If another CARLA client is already using port 8000, change the port in Simulation.__init__ and in EV.initialize_autopilot.

Pedestrian re-routing

Walkers that reach their destination are automatically assigned a new random destination. Walkers that die mid-run are destroyed and replaced so that pedestrian density stays approximately constant.