visionsim.dataset package

Submodules

visionsim.dataset.dataset module

class visionsim.dataset.dataset.PathTransforms(paths: Sequence[Path], iter_npys: bool = True, **kwargs)[source]

Bases: object

Given a sequence of paths to load from, yield the minimal transforms dictionary for each path.

Specifically, for image paths we just yield {"file_path": paths[idx]} for every index, but if some of the paths are numpy arrays, and iter_npys is true, we unpack the array’s first dimension, and return the corresponding offset as well. For instance, if paths points to a png, a npy of shape (4, H, W, C), and another png, an index of 3 will return the path to the numpy file and an offset of 3.

__init__(paths: Sequence[Path], iter_npys: bool = True, **kwargs) None[source]

Initialize a sequence of “dummy” transform dictionaries from a set of paths.

Parameters:
  • paths (Sequence[Path]) – Paths to yield from

  • iter_npys (bool, optional) – If true, yield from numpy arrays will before moving on to next path. Defaults to True.

  • **kwargs (dict[str, Any]) – Additional key/value pairs to include in each transform dict.

class visionsim.dataset.dataset.Dataset(transforms: Sequence[dict[str, Any]], root: str | PathLike | None = None, cameras: set[Camera] | None = None)[source]

Bases: Dataset

Main dataset class for loading a .db/.json dataset or a set of image/exr/npy files.

__init__(transforms: Sequence[dict[str, Any]], root: str | PathLike | None = None, cameras: set[Camera] | None = None) None[source]

Initialize a dataset object.

Note

No data validation is performed here, you likely want to use one of the classmethods such as from_path() or from_pattern() instead.

Parameters:
  • transforms (Sequence[dict[str, Any]]) – A sequence of transforms dicts, which at a minimum should have a file_path key defined.

  • root (str | os.PathLike | None, optional) – Dataset root directory, if supplied all file_paths are assumed to be relative to it. Defaults to None.

  • cameras (set[Camera] | None, optional) – Set of camera objects. Defaults to None.

classmethod from_path(root: str | PathLike) Self[source]

Load a dataset from a path.

Parameters:

root (str | os.PathLike) – Path to dataset file (either a .db or .json file) or a directory containing a valid dataset.

Raises:

RuntimeError – raised if a dataset is not found at the provided path, or if multiple datasets are found.

Returns:

instantiated Dataset object

Return type:

Self

classmethod from_paths(paths: Sequence[Path], iter_npys: bool = True, root: str | PathLike | None = None, cameras: set[Camera] | None = None, **kwargs) Self[source]

Create a dataset object from a collection of data files.

Parameters:
  • paths (Sequence[Path]) – Paths to load data from.

  • iter_npys (bool, optional) – If true, step into the first dimension of any numpy files when iterating over data. Defaults to True.

  • root (str | os.PathLike | None, optional) – Dataset root directory, if supplied all file_paths are assumed to be relative to it. Defaults to None.

  • cameras (set[Camera] | None, optional) – Set of camera objects. Defaults to None.

  • **kwargs (dict[str, Any]) – Optional keyword arguments passed to PathTransforms

Raises:

ValueError – raised if provided paths do not exist or if they are not subpaths of root (when provided).

Returns:

instantiated Dataset object

Return type:

Self

classmethod from_pattern(root: str | ~os.PathLike, pattern: str, cameras: set[~visionsim.dataset.models.Camera] | None = None, iter_npys: bool = True, key: ~collections.abc.Callable[[~typing.Any], ~typing.Any] = functools.partial(<function natsort_key>, key=None, string_func=<function parse_string_factory.<locals>.func>, bytes_func=<function parse_bytes_factory.<locals>.<lambda>>, num_func=<function parse_number_or_none_factory.<locals>.func>), **kwargs) Self[source]

Same as from_paths() but will search for all paths that match the provided pattern (as found by pathlib’s glob)

property paths: list[Path][source]

List of all data file paths (normalized)

property poses: list[list[list[float]] | ndarray[tuple[Any, ...], dtype[floating]]] | None[source]

List of all camera poses, if available

static load_data(path: str | os.PathLike, idx: tuple[int | slice, ...] = (), auto_collapse: bool = True, bitpack_dim: Literal[0, 1, 2] | None = None, unpacked_size: int | None = None) int | float | npt.NDArray[source]

Load data from provided path, optionally slicing it.

Support various image formats, as provided by imageio’s imread, exrs files, and numpy arrays (optionally bitpacked).

Note

This function uses OpenEXR to read exr files as both imageio and opencv cannot read an exr file when the data is stored in any other channel than RGB(A). As of Blender v4 single-channel data, such as depth maps, are correctly saved as single channel exrs, in the V channel. Previously, Blender just saved these as RGB by duplicating the data channel-wise. This function (optionally) auto-detects this issue and returns only a single channel numpy array.

Note

Numpy arrays are not loaded into memory, instead they are memory mapped, making this function safe to use with very large arrays.

Parameters:
  • path (str | os.PathLike) – Path to the image file or numpy array.

  • idx (tuple[int | slice], optional) – If present, slice the data using this index. In most cases this is equivalent to slicing the data after loading it, but for bitpacked numpy arrays, the slice needs to be modified first. Defaults to empty tuple (no slicing).

  • auto_collapse (bool, optional) – If true, when loading an EXR file that has duplicated channels, collapse them down into a single channel. See note for more. Only used when loading an EXR file that is saved using the “RGB” channel. Defaults to True.

  • bitpack_dim (Literal[0, 1, 2] | None, optional) – Axis along which to bits have been packed. Only used when loading data from a numpy file. Defaults to None.

  • unpacked_size (int | None, optional) – Length of bitpacked axis once unpacked, if not specified data will be returned in a larger array that is a multiple of 8. Only used when loading from a numpy array that is bitpacked.

Returns:

Data loaded from path

Return type:

int | float | npt.NDArray

visionsim.dataset.models module

class visionsim.dataset.models.Camera(*, camera_model: Literal['OPENCV', 'OPENCV_FISHEYE'] | None = None, fl_x: float | None = None, fl_y: float | None = None, cx: float | None = None, cy: float | None = None, h: int | None = None, w: int | None = None, c: int | None = None, k1: float | None = None, k2: float | None = None, k3: float | None = None, k4: float | None = None, p1: float | None = None, p2: float | None = None, fps: float | None = None, **extra_data: Any)[source]

Bases: BaseModel

Camera Intrinsics

model_config = {'extra': 'allow', 'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

camera_model: Literal['OPENCV', 'OPENCV_FISHEYE'] | None

camera model type

fl_x: float | None

focal length x

fl_y: float | None

focal length y

cx: float | None

principal point x

cy: float | None

principal point y

h: int | None

image height

w: int | None

image width

c: int | None

image channels

k1: float | None

first radial distortion parameter, used by [OPENCV, OPENCV_FISHEYE]

k2: float | None

second radial distortion parameter, used by [OPENCV, OPENCV_FISHEYE]

k3: float | None

third radial distortion parameter, used by [OPENCV_FISHEYE]

k4: float | None

fourth radial distortion parameter, used by [OPENCV_FISHEYE]

p1: float | None

first tangential distortion parameter, used by [OPENCV]

p2: float | None

second tangential distortion parameter, used by [OPENCV]

fps: float | None

framerate of camera

class visionsim.dataset.models.Data(*, file_path: Path | None = None, bitpack_dim: int | None = None, bitplanes: int | None = None, **extra_data: Any)[source]

Bases: BaseModel

Frame data

model_config = {'extra': 'allow', 'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

file_path: Path | None

path to data, usually an image or ndarray file

bitpack_dim: int | None

dimension that has been bitpacked

bitplanes: int | None

number of summed bitplanes in image

class visionsim.dataset.models.Frame(*, file_path: Path | None = None, bitpack_dim: int | None = None, bitplanes: int | None = None, camera_model: Literal['OPENCV', 'OPENCV_FISHEYE'] | None = None, fl_x: float | None = None, fl_y: float | None = None, cx: float | None = None, cy: float | None = None, h: int | None = None, w: int | None = None, c: int | None = None, k1: float | None = None, k2: float | None = None, k3: float | None = None, k4: float | None = None, p1: float | None = None, p2: float | None = None, fps: float | None = None, transform_matrix: Annotated[list[list[float]], AfterValidator(func=_validate_transform_matrix)], offset: int | None = None, **extra_data: Any)[source]

Bases: Camera, Data

Frame information

model_config = {'extra': 'allow', 'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

transform_matrix: Annotated[_Matrix4x4, AfterValidator(_validate_transform_matrix)]

camera pose (orientation and position) as a 4x4 matrix

offset: int | None

index of frame, used when file_path is an .npy file

class visionsim.dataset.models.Metadata(*, camera_model: Literal['OPENCV', 'OPENCV_FISHEYE'] | None = None, fl_x: float | None = None, fl_y: float | None = None, cx: float | None = None, cy: float | None = None, h: int | None = None, w: int | None = None, c: int | None = None, k1: float | None = None, k2: float | None = None, k3: float | None = None, k4: float | None = None, p1: float | None = None, p2: float | None = None, fps: float | None = None, frames: list[Frame], **extra_data: Any)[source]

Bases: Camera

A superset of the Nerfstudio transforms.json format which enables use of numpy arrays for single photon data, and allows for additional data paths (eg: flow/segmentation) and metadata attributes such as a channels dimension.

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

model_config = {'extra': 'allow', 'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

frames: list[Frame]

per-frame data, intrinsics and extrinsics parameters

classmethod load(path: str | PathLike, rename_to: str = 'file_path') Self[source]

Load metadata from a .json or .db transforms file.

Parameters:
  • path (str | os.PathLike) – Path to load metadata from.

  • rename_to (str, optional) – Load data paths from a .db file as a different key. Defaults to “file_path”.

Raises:
  • RuntimeError – raised if loading camera configurations fail.

  • ValueError – raised if file format is not understood.

Returns:

instantiated Metadata object

Return type:

Self

classmethod from_path(path: str | PathLike, rename_to: str = 'file_path') Self[source]

Same as load() with the added bonus of path disambiguation, where path can also be the directory containing the metadata file.

save(path: str | PathLike, *, indent: int = 2) None[source]

Save metadata to a .json or .db transforms file.

Parameters:
  • path (str | os.PathLike) – Path to save metadata to.

  • indent (int, optional) – Indent amount to use when saving JSON file. Defaults to 2.

classmethod from_dense_transforms(transforms: Sequence[dict[str, Any]]) Self[source]

Load metadata from a sequence of dictionary which contain all frame and camera information.

Parameters:

transforms (Sequence[dict[str, Any]]) – Dictionaries containing frame information such as “file_path”, “transform_matrix” and camera parameters.

Returns:

instantiated Metadata object

Return type:

Self

classmethod from_frames(frames: Sequence[Frame] | Sequence[dict[str, Any]], camera: Camera | dict[str, Any] | None = None) Self[source]

Load metadata from Frame objects (or their model dicts) and a single Camera object (or model dict).

Parameters:
  • frames (Sequence[Frame] | Sequence[dict[str, Any]]) – Frame instances to load from.

  • camera (Camera | dict[str, Any] | None, optional) – Global camera to use, if multiple cameras are needed, pass them as parts of the frames. Defaults to None (use frame cameras).

Returns:

instantiated Metadata object

Return type:

Self

iter_dense_transforms(data_type: str | None = None, rename_to: str = 'path', relative_to: Path | None = None) Iterator[dict[str, Any]][source]

Yield dictionaries containing all frame and camera information, one per frame.

Parameters:
  • data_type (str | None, optional) – Select which data type to iterate over, since there might be multiple (“file_path”, “mask_path”, etc). Defaults to None (all available).

  • rename_to (str, optional) – Rename key of iterated data, for instance from “file_path” to “path”. Only used if data_type is set. Defaults to “path”.

  • relative_to (Path | None, optional) – Make data paths relative to provided path. Defaults to not modifying paths (None).

Yields:

Iterator[dict[str, Any]] – Dictionaries containing all relevant frame data

to_dense_transforms(*args, **kwargs) list[dict[str, Any]][source]

Same as iter_dense_transforms() but returns a list instead of a generator.

property data_types: set[str]

Data types that are defined for each frame, such as file_path of depth_file_path.

property cameras: set[Camera]

Set of defined cameras.

property poses: list[list[list[float]] | ndarray[tuple[Any, ...], dtype[floating]]]

Pose matrices of all frames.

property path: Path | None

Path to loaded metadata file, may be undefined.

property arclength: float[source]

Calculate the length of the trajectory

Module contents