visionsim.cli package¶

Submodules¶

visionsim.cli.blender module¶

visionsim.cli.blender.render_animation(blend_file: Path, output_dir: Path, /, config: RenderConfig, frame_start: int | None = None, frame_end: int | None = None, output_file: Path | None = None, dry_run: bool = False) → None[source]¶

Create datasets by rendering out a sequence from a single blend-file.

Parameters:

blend_file – Path to blend file.
output_dir – Dataset output folder.
config – Render configuration.
frame_start – Start rendering at this frame index (inclusive).
frame_end – Stop rendering at this frame index (inclusive).
output_file – If set, write the modified blend file to this path. Helpful for troubleshooting. Defaults to not saving.
dry_run – if true, nothing will be rendered at all. Defaults to False.

visionsim.cli.dataset module¶

visionsim.cli.dataset.convert(input_dir: Path, output_dir: Path | None = None, force: bool = False) → None[source]¶

Convert a .db database to a .json or vice-versa.

Parameters:

input_dir – directory in which to look for dataset
output_dir – directory in which to save new dataset. If not set, save new metadata file in same directory, otherwise copy over all data to a new directory.
force – if true, overwrite output file(s) if present

visionsim.cli.dataset.merge(input_files: list[Path], names: list[str] | None = None, output_file: Path = PosixPath('combined.json')) → None[source]¶

Merge one or more dataset files.

Typically there will be dataset file per data type (frames, depth, etc) but these can be combined if they are compatible (same number of frames, same camera, etc) to yield Nerfstudio-compatible “transform.json” files that might have “depth_file_path” or “mask_path” in addition to a “file_path”. This does not touch the underlying data, only modifies the transforms files.

This can be used to rename a data type for a single file, merge multiple metadata files that already have distinct data type names, or merge and rename many metadata files altogether.

Parameters:

input_files (list[Path]) – List of datasets to merge, can either be the path of a metadata file or it’s directory.
names (list[str] | None, optional) – What to rename each “path” argument to. Defaults to None.
output_file (Path, optional) – Where to save metadata file, should be a .json file. Defaults to “combined.json”.

visionsim.cli.dataset.to_pointcloud(colors: Path, depths: Path | None = None, points: Path | None = None, output: Path = PosixPath('pointcloud.ply'), p: float = 0.15, binary: bool = True, force: bool = False) → None[source]¶

Generate a .ply point cloud from datasets.

Parameters:

colors – path to dataset to use for point colors, must contain RGB data that is assumed to be in uint8.
depths – path to dataset to use for depth-based 3D points. A pinhole camera model is used to project depth values to 3D points.
points – path to dataset to use for world-space 3D points. If set, this will be used instead of depth-based points.
output – path to save PLY file to.
p – probability of sampling a pixel.
binary – If true, save as a binary PLY file (smaller and faster).
force – If true, overwrite output file if present.

visionsim.cli.emulate module¶

visionsim.cli.emulate.spad(input_dir: Path, output_dir: Path, flux_gain: float = 1.0, bitplanes: int = 1, bitdepth: int | None = None, force_gray: bool = False, seed: int = 2147483647, pattern: str | None = None, max_size: int = 1000, force: bool = False) → None[source]¶

Perform binomial sampling on linearized RGB frames to yield (summed) single photon frames

This will save numpy files which may be bitpacked (when bitplanes == 1) and may have different dtypes depending on the number of summed bitplanes. The shape of the output arrays will be (max_size, h, w, c) or (remainder, h, w, c) where remainder = len(dataset) % max_size, where the width dimension is ceil(width / 8) when bitpacked.

If the input contains alpha channel (determined by the last dimension of the input images), it will be stripped.

Parameters:

input_dir – directory in which to look for frames
output_dir – directory in which to save single photon frames
pattern – used to find source image files to convert to single photon frames, not needed when input_dir points to a valid dataset.
flux_gain – multiplicative factor controlling dynamic range of output
bitplanes – number of summed binary measurements
bitdepth – if set, bitplanes will be overridden to 2**bitdepth - 1
force_gray – to disable RGB sensing even if the input images are color
seed – random seed to use while sampling, ensures reproducibility
max_size – maximum number of frames per output array before rolling over to new file
force – if true, overwrite output file(s) if present, else throw error

visionsim.cli.emulate.events(input_dir: Path, output_dir: Path, fps: int, pattern: str | None = None, pos_thres: float = 0.2, neg_thres: float = 0.2, sigma_thres: float = 0.03, cutoff_hz: int = 200, leak_rate_hz: float = 1.0, shot_noise_rate_hz: float = 10.0, seed: int = 2147483647, force: bool = False) → None[source]¶

Emulate an event camera using v2e and high speed input frames

Parameters:

input_dir – directory in which to look for frames
output_dir – directory in which to save events
fps – frame rate of input sequence
pattern – used to find source image files to convert to events, not needed when input_dir points to a valid dataset.
pos_thres – nominal threshold of triggering positive event in log intensity
neg_thres – nominal threshold of triggering negative event in log intensity
sigma_thres – std deviation of threshold in log intensity
cutoff_hz – 3dB cutoff frequency in Hz of DVS photoreceptor, default: 200,
leak_rate_hz – leak event rate per pixel in Hz, from junction leakage in reset switch
shot_noise_rate_hz – shot noise rate in Hz
seed – random seed to use while sampling, ensures reproducibility
force – if true, overwrite output file(s) if present, else throw error

visionsim.cli.emulate.rgb(input_dir: Path, output_dir: Path, chunk_size: int = 10, shutter_frac: float = 1.0, readout_std: float = 16.0, fwc: float | None = None, flux_gain: float = 4096.0, iso_gain: float = 1.0, adc_bitdepth: int = 12, mosaic: bool = False, demosaic: Literal['off', 'bilinear', 'MHC04'] = 'MHC04', denoise_sigma: float = 0.0, sharpen_weight: float = 0.0, pattern: str | None = None, force: bool = False) → None[source]¶

Simulate real camera, adding read/poisson noise and tonemapping

Parameters:

input_dir – directory in which to look for frames
output_dir – directory in which to save binary frames
chunk_size – number of consecutive frames to average together
shutter_frac – fraction of inter-frame duration shutter is active (0 to 1)
readout_std – standard deviation of gaussian read noise in photoelectrons
fwc – full well capacity of sensor in photoelectrons
flux_gain – factor to scale the input images before Poisson simulation
iso_gain – gain for photo-electron reading after Poisson rng
adc_bitdepth – ADC bitdepth
mosaic – implement mosaiced R-/G-/B- pixels or an innately 3-channel sensor
demosaic – demosaicing method (default Malvar et al.’s method)
denoise_sigma – Gaussian blur with this sigma will be used (default 0.0 disables this)
sharpen_weight – weight used in sharpening (default 0.0 disables this)
pattern – used to find source image files to convert to rgb frames, not needed when input_dir points to a valid dataset.
force – if true, overwrite output file(s) if present

visionsim.cli.emulate.imu(input_dir: Path, output_file: Path | None = None, seed: int = 2147483647, gravity: str = '(0.0, 0.0, -9.8)', dt: float = 0.00125, init_bias_acc: str = '(0.0, 0.0, 0.0)', init_bias_gyro: str = '(0.0, 0.0, 0.0)', std_bias_acc: float = 5.5e-05, std_bias_gyro: float = 2e-05, std_acc: float = 0.008, std_gyro: float = 0.0012, force: bool = False) → None[source]¶

Simulate data from a co-located IMU using the poses in a transforms.json or transforms.db file.

Parameters:

input_dir – directory in which to look for transforms,
output_file – file in which to save simulated IMU data. Prints to stdout if omitted.
seed – RNG seed value for reproducibility.
gravity – gravity vector in world coordinate frame. Given in m/s^2.
dt – time between consecutive transforms.json poses (assumed regularly spaced). Given in seconds.
init_bias_acc – initial bias/drift in accelerometer reading. Given in m/s^2.
init_bias_gyro – initial bias/drift in gyroscope reading. Given in rad/s.
std_bias_acc – stdev for random-walk component of error (drift) in accelerometer. Given in m/(s^3 sqrt(Hz))
std_bias_gyro – stdev for random-walk component of error (drift) in gyroscope. Given in rad/(s^2 sqrt(Hz))
std_acc – stdev for white-noise component of error in accelerometer. Given in m/(s^2 sqrt(Hz))
std_gyro – stdev for white-noise component of error in gyroscope. Given in rad/(s sqrt(Hz))
force – if true, overwrite output file(s) if present

visionsim.cli.ffmpeg module¶

visionsim.cli.ffmpeg.animate(input_dir: Path, pattern: str | None = None, outfile: Path = PosixPath('out.mp4'), fps: float | None = None, crf: int = 22, vcodec: str = 'libx264', step: int = 1, multiple: int | None = None, force: bool = False, bg_color: str = 'black', strip_alpha: bool = False) → None[source]¶

Combine generated frames into an MP4 using ffmpeg wizardry.

This is roughly equivalent to running the “image2” demuxer in ffmpeg, with the added benefit of being able to skip frames using a step size, strip alpha channels from PNGs, and automatically handling the case where the input frames are numpy arrays. If using a dataset as input, the bitplanes attribute will be used to normalize the data to uint8 for visualization.

Parameters:

input_dir – directory in which to look for frames,
pattern – If provided search for files matching this pattern. Otherwise, look for a valid dataset in the input directory.
outfile – where to save generated mp4
fps – frames per second in video, if None, will be inferred from dataset
crf – constant rate factor for video encoding (0-51), lower is better quality but more memory
vcodec – video codec to use (either libx264 or libx265)
step – drop some frames when making video, use frames 0+step*n
multiple – some codecs require size to be a multiple of n
force – if true, overwrite output file if present
bg_color – for images with transparencies, namely PNGs, use this color as a background
strip_alpha – if true, do not pre-process PNGs to remove transparencies

visionsim.cli.ffmpeg.combine(matrix: str, outfile: Path = PosixPath('combined.mp4'), mode: str = 'shortest', color: str = 'white', multiple: int = 2, force: bool = False) → None[source]¶

Combine multiple videos into one by stacking, padding and resizing them using ffmpeg.

Internally this task will first optionally pad all videos to length using ffmpeg’s tpad filter, then scale all videos in a row to have the same height, combine rows together using the hstack filter before finally scaleing row-videos to have same width and vstacking them together.

Parameters:

matrix – Way to specify videos to combine as a 2D matrix of file paths
outfile – where to save generated mp4
mode – if ‘shortest’ combined video will last as long s shortest input video. If ‘static’, the last frame of videos that are shorter than the longest input video will be repeated. If ‘pad’, all videos as padded with frames of color to last the same duration.
color – color to pad videos with, only used if mode is ‘pad’
multiple – some codecs require size to be a multiple of n
force – if true, overwrite output file if present

Example

The input videos can also be specified in a 2D array using the --matrix argument like so:

$ visionsim ffmpeg.combine --matrix='[["a.mp4", "b.mp4"]]' --outfile="output.mp4"

visionsim.cli.ffmpeg.grid(input_dir: Path, width: int = -1, height: int = -1, pattern: str = '*.mp4', outfile: Path = PosixPath('combined.mp4'), force: bool = False) → None[source]¶

Make a mosaic from videos in a folder, organizing them in a grid

Parameters:

input_dir – directory containing all video files (mp4’s expected),
width – width of video grid to produce
height – height of video grid to produce
pattern – use files that match this pattern as inputs
outfile – where to save generated mp4
force – if true, overwrite output file if present

visionsim.cli.ffmpeg.count_frames(input_file: Path, /) → int[source]¶

Count the number of frames a video file contains using ffprobe

Parameters:: input_file – video file input
Returns:: Number of frames in video.
Return type:: int

visionsim.cli.ffmpeg.duration(input_file: Path, /) → float[source]¶

Return duration (in seconds) of first video stream in file using ffprobe

Parameters:: input_file – video file input
Returns:: Video duration in seconds.
Return type:: float

visionsim.cli.ffmpeg.dimensions(input_file: Path) → tuple[int, int][source]¶

Return size (WxH in pixels) of first video stream in file using ffprobe

Parameters:: input_file – video file input
Returns:: Video size as a (width, height) tuple.
Return type:: tuple[int, int]

visionsim.cli.ffmpeg.extract(input_file: Path, output_dir: Path, pattern: str = 'frames_%06d.png') → None[source]¶

Extract frames from video file

Parameters:

input_file – path to video file from which to extract frames,
output_dir – directory in which to save extracted frames,
pattern – filenames of frames will match this pattern

visionsim.cli.interpolate module¶

visionsim.cli.interpolate.video(input_file: Path, output_file: Path, method: str = 'rife', n: int = 2) → None[source]¶

Interpolate video by extracting all frames, performing frame-wise interpolation and re-assembling video

Parameters:

input_file – path to video file from which to extract frames
output_file – path in which to save interpolated video
method – interpolation method to use, only RIFE (ECCV22) is supported for now, default: ‘rife’
n – interpolation factor, must be a multiple of 2, default: 2

visionsim.cli.interpolate.dataset(input_dir: Path, output_dir: Path, pattern: str | None = None, method: Literal['rife'] = 'rife', n: int = 2) → None[source]¶

Interpolate between a series of frames or a dataset (both it’s images and poses)

Note

This only works if the dataset has a single camera, as interpolating camera settings or types is not possible. Further, the data needs to be saved as images.

Parameters:

input_dir – directory in which to look for frames
output_dir – directory in which to save interpolated frames
pattern – used to find source image files to interpolate from, not needed when input_dir points to a valid dataset.
method – interpolation method to use, only RIFE (ECCV22) is supported for now, default: ‘rife’
n – interpolation factor, must be a multiple of 2, default: 2

visionsim.cli.transforms module¶

visionsim.cli.transforms.colorize_depths(input_dir: Path, output_dir: Path, pattern: str = '**/*.exr', cmap: str = 'turbo', ext: str = '.png', vmin: float | None = None, vmax: float | None = None, quantile: float = 0.01, step: int = 1) → None[source]¶

Convert .exr depth maps into color-coded images for visualization

Parameters:

input_dir – directory in which to look for frames
output_dir – directory in which to save colorized frames
pattern – filenames of frames should match this
cmap – which matplotlib colormap to use
ext – which format to save colorized frames as
vmin – minimum expected depth used to normalize colormap
vmax – maximum expected depth used to normalize colormap
quantile – if vmin/vmax are None, use this quantile to estimate them
step – drop some frames when colorizing, use frames 0+step*n

visionsim.cli.transforms.colorize_flows(input_dir: Path, output_dir: Path, direction: Literal['forward', 'backward'] = 'forward', pattern: str = '**/*.exr', ext: str = '.png', vmax: float | None = None, quantile: float = 0.01, step: int = 1) → None[source]¶

Convert .exr optical flow maps into color-coded images for visualization

Parameters:

input_dir – directory in which to look for frames
output_dir – directory in which to save colorized frames
direction – direction of flow to colorize
pattern – filenames of frames should match this
ext – which format to save colorized frames as
vmax – maximum expected flow magnitude
quantile – if vmax is None, use this quantile to estimate it
step – drop some frames when colorizing, use frames 0+step*n

visionsim.cli.transforms.colorize_normals(input_dir: Path, output_dir: Path, pattern: str = '**/*.exr', ext: str = '.png', step: int = 1) → None[source]¶

Convert .exr normal maps into color-coded images for visualization

Parameters:

input_dir – directory in which to look for frames
output_dir – directory in which to save colorized frames
pattern – filenames of frames should match this
ext – which format to save colorized frames as
step – drop some frames when colorizing, use frames 0+step*n

visionsim.cli.transforms.colorize_segmentations(input_dir: Path, output_dir: Path, pattern: str = '**/*.exr', ext: str = '.png', num_objects: int | None = None, shuffle: bool = True, seed: int = 1234, step: int = 1) → None[source]¶

Convert .exr segmentation maps into color-coded images for visualization

Parameters:

input_dir – directory in which to look for frames
output_dir – directory in which to save colorized frames
pattern – filenames of frames should match this
ext – which format to save colorized frames as
num_objects – number of unique objects to expect in the scene
shuffle – if true, colorize items in a random order
seed – seed used when shuffling colors
step – drop some frames when colorizing, use frames 0+step*n

visionsim.cli.transforms.tonemap_frames(input_dir: Path, output_dir: Path, pattern: str = '**/*.exr', ext: str = '.png', hdr_quantile: float = 0.01) → None[source]¶

Convert .exr linear intensity frames (or composites) into tone-mapped sRGB images

Parameters:

input_dir – directory in which to look for frames
output_dir – directory in which to save tone mapped frames
pattern – filenames of frames should match this
ext – which format to save colorized frames as
hdr_quantile – calculate dynamic range using brightness quantiles instead of extrema

Module contents¶

visionsim.cli.post_install(executable: str | PathLike | None = None, editable: bool = False)[source]¶

Install additional dependencies

Parameters:

executable (str | os.PathLike | None, optional) – Path to Blender executable. Defaults to one found on $PATH.
editable – (bool, optional): If set, install current visionsim as editable in blender. Only works if visionsim is already installed as editable locally.

visionsim.cli.main()[source]¶