visionsim.cli package

Submodules

visionsim.cli.blender module

visionsim.cli.blender.render_animation(blend_file: Path, output_dir: Path, /, config: RenderConfig, frame_start: int | None = None, frame_end: int | None = None, output_file: Path | None = None, dry_run: bool = False) None[source]

Create datasets by rendering out a sequence from a single blend-file.

Parameters:
  • blend_file – Path to blend file.

  • output_dir – Dataset output folder.

  • config – Render configuration.

  • frame_start – Start rendering at this frame index (inclusive).

  • frame_end – Stop rendering at this frame index (inclusive).

  • output_file – If set, write the modified blend file to this path. Helpful for troubleshooting. Defaults to not saving.

  • dry_run – if true, nothing will be rendered at all. Defaults to False.

visionsim.cli.dataset module

visionsim.cli.dataset.convert(input_dir: Path, output_dir: Path | None = None, force: bool = False) None[source]

Convert a .db database to a .json or vice-versa.

Parameters:
  • input_dir – directory in which to look for dataset

  • output_dir – directory in which to save new dataset. If not set, save new metadata file in same directory, otherwise copy over all data to a new directory.

  • force – if true, overwrite output file(s) if present

visionsim.cli.dataset.merge(input_files: list[Path], names: list[str] | None = None, output_file: Path = PosixPath('combined.json')) None[source]

Merge one or more dataset files.

Typically there will be dataset file per data type (frames, depth, etc) but these can be combined if they are compatible (same number of frames, same camera, etc) to yield Nerfstudio-compatible “transform.json” files that might have “depth_file_path” or “mask_path” in addition to a “file_path”. This does not touch the underlying data, only modifies the transforms files.

This can be used to rename a data type for a single file, merge multiple metadata files that already have distinct data type names, or merge and rename many metadata files altogether.

Parameters:
  • input_files (list[Path]) – List of datasets to merge, can either be the path of a metadata file or it’s directory.

  • names (list[str] | None, optional) – What to rename each “path” argument to. Defaults to None.

  • output_file (Path, optional) – Where to save metadata file, should be a .json file. Defaults to “combined.json”.

visionsim.cli.dataset.to_pointcloud(colors: Path, depths: Path | None = None, points: Path | None = None, output: Path = PosixPath('pointcloud.ply'), p: float = 0.15, binary: bool = True, force: bool = False) None[source]

Generate a .ply point cloud from datasets.

Parameters:
  • colors – path to dataset to use for point colors, must contain RGB data that is assumed to be in uint8.

  • depths – path to dataset to use for depth-based 3D points. A pinhole camera model is used to project depth values to 3D points.

  • points – path to dataset to use for world-space 3D points. If set, this will be used instead of depth-based points.

  • output – path to save PLY file to.

  • p – probability of sampling a pixel.

  • binary – If true, save as a binary PLY file (smaller and faster).

  • force – If true, overwrite output file if present.

visionsim.cli.emulate module

visionsim.cli.emulate.spad(input_dir: Path, output_dir: Path, flux_gain: float = 1.0, bitplanes: int = 1, bitdepth: int | None = None, force_gray: bool = False, seed: int = 2147483647, pattern: str | None = None, max_size: int = 1000, force: bool = False) None[source]

Perform binomial sampling on linearized RGB frames to yield (summed) single photon frames

This will save numpy files which may be bitpacked (when bitplanes == 1) and may have different dtypes depending on the number of summed bitplanes. The shape of the output arrays will be (max_size, h, w, c) or (remainder, h, w, c) where remainder = len(dataset) % max_size, where the width dimension is ceil(width / 8) when bitpacked.

If the input contains alpha channel (determined by the last dimension of the input images), it will be stripped.

Parameters:
  • input_dir – directory in which to look for frames

  • output_dir – directory in which to save single photon frames

  • pattern – used to find source image files to convert to single photon frames, not needed when input_dir points to a valid dataset.

  • flux_gain – multiplicative factor controlling dynamic range of output

  • bitplanes – number of summed binary measurements

  • bitdepth – if set, bitplanes will be overridden to 2**bitdepth - 1

  • force_gray – to disable RGB sensing even if the input images are color

  • seed – random seed to use while sampling, ensures reproducibility

  • max_size – maximum number of frames per output array before rolling over to new file

  • force – if true, overwrite output file(s) if present, else throw error

visionsim.cli.emulate.events(input_dir: Path, output_dir: Path, fps: int, pattern: str | None = None, pos_thres: float = 0.2, neg_thres: float = 0.2, sigma_thres: float = 0.03, cutoff_hz: int = 200, leak_rate_hz: float = 1.0, shot_noise_rate_hz: float = 10.0, seed: int = 2147483647, force: bool = False) None[source]

Emulate an event camera using v2e and high speed input frames

Parameters:
  • input_dir – directory in which to look for frames

  • output_dir – directory in which to save events

  • fps – frame rate of input sequence

  • pattern – used to find source image files to convert to events, not needed when input_dir points to a valid dataset.

  • pos_thres – nominal threshold of triggering positive event in log intensity

  • neg_thres – nominal threshold of triggering negative event in log intensity

  • sigma_thres – std deviation of threshold in log intensity

  • cutoff_hz – 3dB cutoff frequency in Hz of DVS photoreceptor, default: 200,

  • leak_rate_hz – leak event rate per pixel in Hz, from junction leakage in reset switch

  • shot_noise_rate_hz – shot noise rate in Hz

  • seed – random seed to use while sampling, ensures reproducibility

  • force – if true, overwrite output file(s) if present, else throw error

visionsim.cli.emulate.rgb(input_dir: Path, output_dir: Path, chunk_size: int = 10, shutter_frac: float = 1.0, readout_std: float = 16.0, fwc: float | None = None, flux_gain: float = 4096.0, iso_gain: float = 1.0, adc_bitdepth: int = 12, mosaic: bool = False, demosaic: Literal['off', 'bilinear', 'MHC04'] = 'MHC04', denoise_sigma: float = 0.0, sharpen_weight: float = 0.0, pattern: str | None = None, force: bool = False) None[source]

Simulate real camera, adding read/poisson noise and tonemapping

Parameters:
  • input_dir – directory in which to look for frames

  • output_dir – directory in which to save binary frames

  • chunk_size – number of consecutive frames to average together

  • shutter_frac – fraction of inter-frame duration shutter is active (0 to 1)

  • readout_std – standard deviation of gaussian read noise in photoelectrons

  • fwc – full well capacity of sensor in photoelectrons

  • flux_gain – factor to scale the input images before Poisson simulation

  • iso_gain – gain for photo-electron reading after Poisson rng

  • adc_bitdepth – ADC bitdepth

  • mosaic – implement mosaiced R-/G-/B- pixels or an innately 3-channel sensor

  • demosaic – demosaicing method (default Malvar et al.’s method)

  • denoise_sigma – Gaussian blur with this sigma will be used (default 0.0 disables this)

  • sharpen_weight – weight used in sharpening (default 0.0 disables this)

  • pattern – used to find source image files to convert to rgb frames, not needed when input_dir points to a valid dataset.

  • force – if true, overwrite output file(s) if present

visionsim.cli.emulate.imu(input_dir: Path, output_file: Path | None = None, seed: int = 2147483647, gravity: str = '(0.0, 0.0, -9.8)', dt: float = 0.00125, init_bias_acc: str = '(0.0, 0.0, 0.0)', init_bias_gyro: str = '(0.0, 0.0, 0.0)', std_bias_acc: float = 5.5e-05, std_bias_gyro: float = 2e-05, std_acc: float = 0.008, std_gyro: float = 0.0012, force: bool = False) None[source]

Simulate data from a co-located IMU using the poses in a transforms.json or transforms.db file.

Parameters:
  • input_dir – directory in which to look for transforms,

  • output_file – file in which to save simulated IMU data. Prints to stdout if omitted.

  • seed – RNG seed value for reproducibility.

  • gravity – gravity vector in world coordinate frame. Given in m/s^2.

  • dt – time between consecutive transforms.json poses (assumed regularly spaced). Given in seconds.

  • init_bias_acc – initial bias/drift in accelerometer reading. Given in m/s^2.

  • init_bias_gyro – initial bias/drift in gyroscope reading. Given in rad/s.

  • std_bias_acc – stdev for random-walk component of error (drift) in accelerometer. Given in m/(s^3 sqrt(Hz))

  • std_bias_gyro – stdev for random-walk component of error (drift) in gyroscope. Given in rad/(s^2 sqrt(Hz))

  • std_acc – stdev for white-noise component of error in accelerometer. Given in m/(s^2 sqrt(Hz))

  • std_gyro – stdev for white-noise component of error in gyroscope. Given in rad/(s sqrt(Hz))

  • force – if true, overwrite output file(s) if present

visionsim.cli.ffmpeg module

visionsim.cli.ffmpeg.animate(input_dir: Path, pattern: str | None = None, outfile: Path = PosixPath('out.mp4'), fps: float | None = None, crf: int = 22, vcodec: str = 'libx264', step: int = 1, multiple: int | None = None, force: bool = False, bg_color: str = 'black', strip_alpha: bool = False) None[source]

Combine generated frames into an MP4 using ffmpeg wizardry.

This is roughly equivalent to running the “image2” demuxer in ffmpeg, with the added benefit of being able to skip frames using a step size, strip alpha channels from PNGs, and automatically handling the case where the input frames are numpy arrays. If using a dataset as input, the bitplanes attribute will be used to normalize the data to uint8 for visualization.

Parameters:
  • input_dir – directory in which to look for frames,

  • pattern – If provided search for files matching this pattern. Otherwise, look for a valid dataset in the input directory.

  • outfile – where to save generated mp4

  • fps – frames per second in video, if None, will be inferred from dataset

  • crf – constant rate factor for video encoding (0-51), lower is better quality but more memory

  • vcodec – video codec to use (either libx264 or libx265)

  • step – drop some frames when making video, use frames 0+step*n

  • multiple – some codecs require size to be a multiple of n

  • force – if true, overwrite output file if present

  • bg_color – for images with transparencies, namely PNGs, use this color as a background

  • strip_alpha – if true, do not pre-process PNGs to remove transparencies

visionsim.cli.ffmpeg.combine(matrix: str, outfile: Path = PosixPath('combined.mp4'), mode: str = 'shortest', color: str = 'white', multiple: int = 2, force: bool = False) None[source]

Combine multiple videos into one by stacking, padding and resizing them using ffmpeg.

Internally this task will first optionally pad all videos to length using ffmpeg’s tpad filter, then scale all videos in a row to have the same height, combine rows together using the hstack filter before finally scaleing row-videos to have same width and vstacking them together.

Parameters:
  • matrix – Way to specify videos to combine as a 2D matrix of file paths

  • outfile – where to save generated mp4

  • mode – if ‘shortest’ combined video will last as long s shortest input video. If ‘static’, the last frame of videos that are shorter than the longest input video will be repeated. If ‘pad’, all videos as padded with frames of color to last the same duration.

  • color – color to pad videos with, only used if mode is ‘pad’

  • multiple – some codecs require size to be a multiple of n

  • force – if true, overwrite output file if present

Example

The input videos can also be specified in a 2D array using the --matrix argument like so:

$ visionsim ffmpeg.combine --matrix='[["a.mp4", "b.mp4"]]' --outfile="output.mp4"
visionsim.cli.ffmpeg.grid(input_dir: Path, width: int = -1, height: int = -1, pattern: str = '*.mp4', outfile: Path = PosixPath('combined.mp4'), force: bool = False) None[source]

Make a mosaic from videos in a folder, organizing them in a grid

Parameters:
  • input_dir – directory containing all video files (mp4’s expected),

  • width – width of video grid to produce

  • height – height of video grid to produce

  • pattern – use files that match this pattern as inputs

  • outfile – where to save generated mp4

  • force – if true, overwrite output file if present

visionsim.cli.ffmpeg.count_frames(input_file: Path, /) int[source]

Count the number of frames a video file contains using ffprobe

Parameters:

input_file – video file input

Returns:

Number of frames in video.

Return type:

int

visionsim.cli.ffmpeg.duration(input_file: Path, /) float[source]

Return duration (in seconds) of first video stream in file using ffprobe

Parameters:

input_file – video file input

Returns:

Video duration in seconds.

Return type:

float

visionsim.cli.ffmpeg.dimensions(input_file: Path) tuple[int, int][source]

Return size (WxH in pixels) of first video stream in file using ffprobe

Parameters:

input_file – video file input

Returns:

Video size as a (width, height) tuple.

Return type:

tuple[int, int]

visionsim.cli.ffmpeg.extract(input_file: Path, output_dir: Path, pattern: str = 'frames_%06d.png') None[source]

Extract frames from video file

Parameters:
  • input_file – path to video file from which to extract frames,

  • output_dir – directory in which to save extracted frames,

  • pattern – filenames of frames will match this pattern

visionsim.cli.interpolate module

visionsim.cli.interpolate.video(input_file: Path, output_file: Path, method: str = 'rife', n: int = 2) None[source]

Interpolate video by extracting all frames, performing frame-wise interpolation and re-assembling video

Parameters:
  • input_file – path to video file from which to extract frames

  • output_file – path in which to save interpolated video

  • method – interpolation method to use, only RIFE (ECCV22) is supported for now, default: ‘rife’

  • n – interpolation factor, must be a multiple of 2, default: 2

visionsim.cli.interpolate.dataset(input_dir: Path, output_dir: Path, pattern: str | None = None, method: Literal['rife'] = 'rife', n: int = 2) None[source]

Interpolate between a series of frames or a dataset (both it’s images and poses)

Note

This only works if the dataset has a single camera, as interpolating camera settings or types is not possible. Further, the data needs to be saved as images.

Parameters:
  • input_dir – directory in which to look for frames

  • output_dir – directory in which to save interpolated frames

  • pattern – used to find source image files to interpolate from, not needed when input_dir points to a valid dataset.

  • method – interpolation method to use, only RIFE (ECCV22) is supported for now, default: ‘rife’

  • n – interpolation factor, must be a multiple of 2, default: 2

visionsim.cli.transforms module

visionsim.cli.transforms.colorize_depths(input_dir: Path, output_dir: Path, pattern: str = '**/*.exr', cmap: str = 'turbo', ext: str = '.png', vmin: float | None = None, vmax: float | None = None, quantile: float = 0.01, step: int = 1) None[source]

Convert .exr depth maps into color-coded images for visualization

Parameters:
  • input_dir – directory in which to look for frames

  • output_dir – directory in which to save colorized frames

  • pattern – filenames of frames should match this

  • cmap – which matplotlib colormap to use

  • ext – which format to save colorized frames as

  • vmin – minimum expected depth used to normalize colormap

  • vmax – maximum expected depth used to normalize colormap

  • quantile – if vmin/vmax are None, use this quantile to estimate them

  • step – drop some frames when colorizing, use frames 0+step*n

visionsim.cli.transforms.colorize_flows(input_dir: Path, output_dir: Path, direction: Literal['forward', 'backward'] = 'forward', pattern: str = '**/*.exr', ext: str = '.png', vmax: float | None = None, quantile: float = 0.01, step: int = 1) None[source]

Convert .exr optical flow maps into color-coded images for visualization

Parameters:
  • input_dir – directory in which to look for frames

  • output_dir – directory in which to save colorized frames

  • direction – direction of flow to colorize

  • pattern – filenames of frames should match this

  • ext – which format to save colorized frames as

  • vmax – maximum expected flow magnitude

  • quantile – if vmax is None, use this quantile to estimate it

  • step – drop some frames when colorizing, use frames 0+step*n

visionsim.cli.transforms.colorize_normals(input_dir: Path, output_dir: Path, pattern: str = '**/*.exr', ext: str = '.png', step: int = 1) None[source]

Convert .exr normal maps into color-coded images for visualization

Parameters:
  • input_dir – directory in which to look for frames

  • output_dir – directory in which to save colorized frames

  • pattern – filenames of frames should match this

  • ext – which format to save colorized frames as

  • step – drop some frames when colorizing, use frames 0+step*n

visionsim.cli.transforms.colorize_segmentations(input_dir: Path, output_dir: Path, pattern: str = '**/*.exr', ext: str = '.png', num_objects: int | None = None, shuffle: bool = True, seed: int = 1234, step: int = 1) None[source]

Convert .exr segmentation maps into color-coded images for visualization

Parameters:
  • input_dir – directory in which to look for frames

  • output_dir – directory in which to save colorized frames

  • pattern – filenames of frames should match this

  • ext – which format to save colorized frames as

  • num_objects – number of unique objects to expect in the scene

  • shuffle – if true, colorize items in a random order

  • seed – seed used when shuffling colors

  • step – drop some frames when colorizing, use frames 0+step*n

visionsim.cli.transforms.tonemap_frames(input_dir: Path, output_dir: Path, pattern: str = '**/*.exr', ext: str = '.png', hdr_quantile: float = 0.01) None[source]

Convert .exr linear intensity frames (or composites) into tone-mapped sRGB images

Parameters:
  • input_dir – directory in which to look for frames

  • output_dir – directory in which to save tone mapped frames

  • pattern – filenames of frames should match this

  • ext – which format to save colorized frames as

  • hdr_quantile – calculate dynamic range using brightness quantiles instead of extrema

Module contents

visionsim.cli.post_install(executable: str | PathLike | None = None, editable: bool = False)[source]

Install additional dependencies

Parameters:
  • executable (str | os.PathLike | None, optional) – Path to Blender executable. Defaults to one found on $PATH.

  • editable – (bool, optional): If set, install current visionsim as editable in blender. Only works if visionsim is already installed as editable locally.

visionsim.cli.main()[source]