Access to raw data

Attention

Information on this page is intended for programmers who would like to process raw recordings made with ScannedReality Studio in their own applications. Correspondingly, the following assumes that the reader has advanced technical knowledge.

Raw recordings made with ScannedReality Studio are saved in a workspace folder configured by the user. This documentation page describes how to parse the data in such recordings.

Notice that the dataset format is designed to handle networked recording setups, to support different camera types with different metadata, and to facilitate changes to the format while keeping compatibility. As a result, the format is slightly non-trivial to parse.

File system structure

Each raw recording is stored in its own folder in the file system. The files and folders within this recording folder follow a fixed structure.

On the top level, there are several configuration and metadata files in YAML format, as well as the calibration and dataset folders:

  • calibration

    • extrinsics.yaml

  • dataset

    • <hostname1>

    • <hostname2>

    • <hostnameN>

  • configuration.yaml

  • dataset.yaml

  • sensor_node_config.yaml

The dataset folder contains one subfolder for each machine that took part in recording the dataset, with the subfolder name being the machine’s hostname. Since no networking functionality is currently available in the public version of ScannedReality Studio, only one hostname will be present in datasets recorded with this version.

Each hostname folder contains a file hostinfo.yaml, containing a list of devices whose data was recorded on this host. In addition, for each device, there is a subfolder whose name follows the format <devicetype>_<serialnumber>:

  • hostinfo.yaml

  • <devicetype1>_<serialnumber1>

  • <devicetype2>_<serialnumber2>

  • <devicetypeM>_<serialnumberM>

Each device’s subfolder contains a file for each sensor of this device that was recorded. For example, an RGB-D camera will have a file for its color image stream and a separate file for its depth image stream. These files are named according to the format <sensortype>_<sensorNameInDevice>. Available sensors are listed in hostinfo.yaml.

Configuration and metadata files

As the configuration and metadata files are not required for parsing the data, they are currently not documented here. Many of their contents should be self-explanatory, however. On a high level, these files store the following information:

  • configuration.yaml stores the program settings with which the dataset was recorded.

  • dataset.yaml stores metadata about the dataset, such as the dataset name entered by the user, and the start and end time of recording, represented as Unix timestamps in nanoseconds.

  • sensor_node_config.yaml stores the network configuration of machines that participated in the recording.

Sensor data files

Sensor data files follow a custom file format. There are two variants of this format with slight differences:

  • A variant for sensors which record data of constant size (for example, cameras recording uncompressed images).

  • A variant for sensors which record data of variable size (for example, cameras recording compressed images).

For Azure Kinect and Femto Bolt data, the color camera images are compressed as MJPEGs, thus the variable-size variant is used for the color sensor, while the depth camera records uncompressed images, thus the constant-size variant is used for the depth sensor.

All numbers in the files are stored in little-endian encoding.

Header text

The header text is formatted as YAML. It describes the format of the data that follows. The purpose of this self-describing file format is to be able to handle different file format versions effectively: It in many cases allows to add or remove data fields without breaking parsing code.

The header text YAML may have up to two nodes at its root:

  • record_format: This node contains a description of the data records within the later part of the file.

  • custom: This optional node, if it is present, may store additional metadata about the data. For example, for cameras the resolution of the camera’s images is given here. Any calibration information obtained from the device is also stored here.

The record_format node contains a sequence of fields, describing the format of one data record by listing its components. Each field is specified by the following nodes:

  • name: A name to identify the field.

  • comment (optional): A comment on this field.

  • type: Type of this field.

  • count: Number of elements in this field, or zero to indicate variable size.

When parsing the header text, look for the fields that you are interested in, identifying them by their name. For parsing images, the following fields with well-defined names should be looked out for:

  • image: The image data, in the specified format (MJPEG for Azure Kinect / Femto Bolt color images, uncompressed unsigned 16-bit values for depth images).

  • timestamp_ns: Image timestamp in system clock (i.e., Unix) time, in nanoseconds.

The field type node may have the following values:

  • u8: unsigned 8-bit integer

  • i8: signed 8-bit integer

  • u16: unsigned 16-bit integer

  • i16: signed 16-bit integer

  • u32: unsigned 32-bit integer

  • i32: signed 32-bit integer

  • u64: unsigned 64-bit integer

  • i64: signed 64-bit integer

  • f32: 32-bit floating-point value (float in C/C++)

  • f64: 64-bit floating-point value (double in C/C++)

The count node gives the number of values for each occurrence of this field. For example, with a type of u32 and a count of 3, each occurrence of this field would consist of three unsigned 32-bit integers.

However, there is an exception to the above: The variable-sized variant of the file format may contain fields with a count of zero. This indicates that this field has variable size. In this case, the field’s type has a different meaning: Instead of specifying the field’s data type, it specifies the field’s size type. This is better explained by looking at how a variable-sized field’s content is laid out in memory (in the later part of a data file): For each occurrence of this field, the first few bytes specify the length of the occurrence. The remaining bytes contain the actual data. For example, with a type of u16, the first 16 bits of each field occurrence would be an unsigned 16-bit integer specifying the remaining size of the occurrence in bytes. If the field instead had a type of u32, then the first 32 (instead of 16) bits would be an unsigned integer specifying the size of the occurrence.

Data records

The recorded sensor data follows the header text. It consists of a number of data records, packed tightly after each other. Each data record is formatted according to the record_format node from the header text.

For example, if the record format consists of one field with a type of u8 and a count of 4, and a second field with a type of u16 and a count of 0, then each data record would start with four bytes of unsigned 8-bit data, followed by an unsigned 16-bit value giving the size of the variable-sized field, following by the corresponding amount of variable-sized data.

For the constant-size variant of the data file format, the data records go until the end of the file. Thus, for this variant, you may determine the number of data records in the file by querying the file size (minus the header size) and dividing this by the (constant) size of a data record.

Be aware that if recording gets interrupted, the file might not end cleanly on the end of a data record. In this case, the incomplete (last) record should be silently ignored, parsing only the previous elements.

For the variable-size variant of the data file format, the data records either go until the end of the file, or until the start of the index chunk if the file contains an index chunk.

An index chunk may only exist in the variable-size variant. It is always at the end of the file, following the data records. This chunk stores the start offset of each data record, in bytes, from the start of the file, allowing to quickly access each record by index without first parsing the whole file. Notice that this chunk is optional, and the same information can be obtained by parsing each data record in the file from start to end.

Concretely, the index chunk is composed as follows:

  • One unsigned 64-bit integer validRecordCount, giving the number of valid data records in the file. ‘Valid’ means the number of complete records; a potential incomplete last record would be excluded from this count.

  • An array of unsigned 64-bit integers, with a length of validRecordCount + 1. Array entry i gives the file offset in bytes of the start of the i-th data record. The extra entry at the end gives the end offset of the last (valid) data record. This allows to compute the size of data record i as fileIndex[i + 1] - fileIndex[i] without requiring a special case for the last record.

If the index chunk cannot be parsed (because the end of file is encountered early), then it should be ignored and the file should be parsed from scratch instead. This may happen if the program was interrupted while writing the index chunk.

Note

When parsing data files, beware: The C/C++ function fseek() operates with long offset values. While this tends to be not an issue on Linux, on Windows it will limit seeking to 32-bit offset values. This will cause trouble with large files on Windows. To fix this, you may need to use platform-dependent seek functions with 64-bit file offsets.

Extrinsics calibration

The YAML file calibration/extrinsics.yaml contains the extrinsics calibration of the cameras as calibrated by the user. The devices node within this file contains a list of calibrated devices. Each device is identified by its serial number (serialNumber node).

The following calibration information is given for each camera device:

  • global_t_colorCamera: Translation component of the global_tr_colorCamera transformation.

  • global_q_colorCamera: Rotation component of the global_tr_colorCamera transformation.

  • colorCamera_t_depthCamera: Translation component of the colorCamera_tr_depthCamera transformation.

  • colorCamera_q_depthCamera: Rotation component of the colorCamera_tr_depthCamera transformation.

  • depthScaleFactor: A correction factor that should be applied to all depth values in the device’s depth images.

Each translation component is given as a 3-vector. Each rotation component is given as a unit quaternion with the following order of components: x, y, z, w. Quaternions are interpreted as in the Eigen library.

  • global_tr_colorCamera is the transformation that, if a point is right-multiplied with it, transforms the point from the color camera’s local coordinate system to the global coordinate system.

  • colorCamera_tr_depthCamera is the transformation that, if a point is right-multiplied with it, transforms the point from the depth camera’s local coordinate system to the color camera’s local coordinate system.

In a camera’s local coordinate system, +x is right, +y is down, and +z is forward. The origin lies at the camera’s projection center.

In the global coordinate system, +x is left, +y is up, and +z is forward. The origin lies on the floor, at the center of the recording area.

Intrinsics calibration and image size

The image size and the camera intrinsics, as retrieved from the Azure Kinect or Femto Bolt devices, are available within the header text of their data files.

The image size is given in the nodes custom / camera_sensor / width and custom / camera_sensor / height.

  • For Azure Kinect devices, the intrinsics are in the node custom / additional_info / k4a_intrinsics_parameters.

  • For Femto Bolt devices, the intrinsics are in the node custom / additional_info / orbbec_intrinsics_parameters.

Both device types use the Brown-Conrady / OpenCV camera model, though the Azure Kinect comes with three additional parameters which are not needed for using the calibration. The intrinsics parameters are given in a string, separated by spaces.

  • For the Azure Kinect, the 15 parameters are given in the same order as documented in the Azure Kinect Sensor SDK.

  • For the Femto Bolt, the 12 parameters are given in the following order:

    • intrinsic.fx

    • intrinsic.fy

    • intrinsic.cx

    • intrinsic.cy

    • distortion.k1

    • distortion.k2

    • distortion.k3

    • distortion.k4

    • distortion.k5

    • distortion.k6

    • distortion.p1

    • distortion.p2

Matching camera images

When a user starts recording a dataset, different devices might start recording at slightly different points in time. Thus, do not rely on images with the same record index being recorded at the same time! Furthermore, timestamps of images will not match exactly, even if they have been recorded simultaneously by two synchronized cameras.

Thus, images from different cameras should be matched such as to minimize the variance among their timestamps.

Interpreting depth values

Depth values are stored as unsigned 16-bit integers, in millimeters. To convert a depth value to meters and apply the calibration, multiply it with depthScaleFactor / 1000. Here, depthScaleFactor is obtained from the camera’s extrinsics calibration.

Furthermore, depending on your application, you might want to add the depthBias value afterwards (obtained from the dataset’s configuration.yaml). This is the user-configured bias to counteract Time-of-flight inaccuracies on surfaces such as human skin.