Access to raw data
Attention
Information on this page is intended for programmers who would like to process raw recordings made with ScannedReality Studio in their own applications. Correspondingly, the following assumes that the reader has advanced technical knowledge.
Raw recordings made with ScannedReality Studio are saved in a workspace folder configured by the user. This documentation page describes how to parse the data in such recordings.
Notice that the dataset format is designed to handle networked recording setups, to support different camera types with different metadata, and to facilitate changes to the format while keeping compatibility. As a result, the format is slightly non-trivial to parse.
File system structure
Each raw recording is stored in its own folder in the file system. The files and folders within this recording folder follow a fixed structure.
On the top level, there are several configuration and metadata files in YAML format,
as well as the calibration
and dataset
folders:
calibration
extrinsics.yaml
dataset
<hostname1>
<hostname2>
…
<hostnameN>
configuration.yaml
dataset.yaml
sensor_node_config.yaml
The dataset
folder contains one subfolder for each machine that took part
in recording the dataset, with the subfolder name being the machine’s hostname.
Since no networking functionality is currently available in
the public version of ScannedReality Studio, only one hostname will be present in datasets
recorded with this version.
Each hostname
folder contains a file hostinfo.yaml
, containing a list of
devices whose data was recorded on this host. In addition, for each device, there is a subfolder
whose name follows the format <devicetype>_<serialnumber>
:
hostinfo.yaml
<devicetype1>_<serialnumber1>
<devicetype2>_<serialnumber2>
…
<devicetypeM>_<serialnumberM>
Each device’s subfolder contains a file for each sensor of this device that was recorded.
For example, an RGB-D camera will have a file for its color image stream and a separate file for its
depth image stream. These files are named according to the
format <sensortype>_<sensorNameInDevice>
. Available sensors are listed in hostinfo.yaml
.
Configuration and metadata files
As the configuration and metadata files are not required for parsing the data, they are currently not documented here. Many of their contents should be self-explanatory, however. On a high level, these files store the following information:
configuration.yaml stores the program settings with which the dataset was recorded.
dataset.yaml stores metadata about the dataset, such as the dataset name entered by the user, and the start and end time of recording, represented as Unix timestamps in nanoseconds.
sensor_node_config.yaml stores the network configuration of machines that participated in the recording.
Sensor data files
Sensor data files follow a custom file format. There are two variants of this format with slight differences:
A variant for sensors which record data of constant size (for example, cameras recording uncompressed images).
A variant for sensors which record data of variable size (for example, cameras recording compressed images).
For Azure Kinect and Femto Bolt data, the color camera images are compressed as MJPEGs, thus the variable-size variant is used for the color sensor, while the depth camera records uncompressed images, thus the constant-size variant is used for the depth sensor.
All numbers in the files are stored in little-endian encoding.
Header
Each data file starts with a header, where the first four bytes act as an identifier with known values:
For the constant-size variant, the identifier are the letters “RCST”.
For the variable-size variant, the identifier are the letters “RCSV”.
For the variable-size variant, two unsigned 64-bit integers follow:
The first number is the offset, in bytes, of the index chunk in this file, measured from the start of the file. In case the file does not have an index chunk, this number is zero.
The second number is currently unused and is thus always zero.
For both file variants, the next 32 bits are an unsigned integer giving the size of the header text in bytes. This header text then follows next.
Header text
The header text is formatted as YAML. It describes the format of the data that follows. The purpose of this self-describing file format is to be able to handle different file format versions effectively: It in many cases allows to add or remove data fields without breaking parsing code.
The header text YAML may have up to two nodes at its root:
record_format
: This node contains a description of the data records within the later part of the file.custom
: This optional node, if it is present, may store additional metadata about the data. For example, for cameras the resolution of the camera’s images is given here. Any calibration information obtained from the device is also stored here.
The record_format
node contains a sequence of fields, describing the format of one data record by listing its components.
Each field is specified by the following nodes:
name
: A name to identify the field.comment
(optional): A comment on this field.type
: Type of this field.count
: Number of elements in this field, or zero to indicate variable size.
When parsing the header text, look for the fields that you are interested in, identifying them by their name. For parsing images, the following fields with well-defined names should be looked out for:
image
: The image data, in the specified format (MJPEG for Azure Kinect / Femto Bolt color images, uncompressed unsigned 16-bit values for depth images).timestamp_ns
: Image timestamp in system clock (i.e., Unix) time, in nanoseconds.
The field type
node may have the following values:
u8
: unsigned 8-bit integeri8
: signed 8-bit integeru16
: unsigned 16-bit integeri16
: signed 16-bit integeru32
: unsigned 32-bit integeri32
: signed 32-bit integeru64
: unsigned 64-bit integeri64
: signed 64-bit integerf32
: 32-bit floating-point value (float
in C/C++)f64
: 64-bit floating-point value (double
in C/C++)
The count
node gives the number of values for each occurrence of this field.
For example, with a type
of u32
and a count
of 3
, each occurrence of this field
would consist of three unsigned 32-bit integers.
However, there is an exception to the above: The variable-sized variant of the file format
may contain fields with a count
of zero. This indicates that this field has variable size.
In this case, the field’s type
has a different meaning: Instead of specifying the field’s
data type, it specifies the field’s size type. This is better explained by looking at how
a variable-sized field’s content is laid out in memory (in the later part of a data file):
For each occurrence of this field, the first few bytes specify the length of the occurrence.
The remaining bytes contain the actual data. For example, with a type
of u16
, the first
16 bits of each field occurrence would be an unsigned 16-bit integer specifying the remaining
size of the occurrence in bytes. If the field instead had a type
of u32
, then the
first 32 (instead of 16) bits would be an unsigned integer specifying the size of the occurrence.
Data records
The recorded sensor data follows the header text.
It consists of a number of data records, packed tightly after each other.
Each data record is formatted according to the record_format
node from the header text.
For example, if the record format consists of one field with a type of u8 and a count of 4, and a second field with a type of u16 and a count of 0, then each data record would start with four bytes of unsigned 8-bit data, followed by an unsigned 16-bit value giving the size of the variable-sized field, following by the corresponding amount of variable-sized data.
For the constant-size variant of the data file format, the data records go until the end of the file. Thus, for this variant, you may determine the number of data records in the file by querying the file size (minus the header size) and dividing this by the (constant) size of a data record.
Be aware that if recording gets interrupted, the file might not end cleanly on the end of a data record. In this case, the incomplete (last) record should be silently ignored, parsing only the previous elements.
For the variable-size variant of the data file format, the data records either go until the end of the file, or until the start of the index chunk if the file contains an index chunk.
An index chunk may only exist in the variable-size variant. It is always at the end of the file, following the data records. This chunk stores the start offset of each data record, in bytes, from the start of the file, allowing to quickly access each record by index without first parsing the whole file. Notice that this chunk is optional, and the same information can be obtained by parsing each data record in the file from start to end.
Concretely, the index chunk is composed as follows:
One unsigned 64-bit integer
validRecordCount
, giving the number of valid data records in the file. ‘Valid’ means the number of complete records; a potential incomplete last record would be excluded from this count.An array of unsigned 64-bit integers, with a length of
validRecordCount + 1
. Array entry i gives the file offset in bytes of the start of the i-th data record. The extra entry at the end gives the end offset of the last (valid) data record. This allows to compute the size of data record i asfileIndex[i + 1] - fileIndex[i]
without requiring a special case for the last record.
If the index chunk cannot be parsed (because the end of file is encountered early), then it should be ignored and the file should be parsed from scratch instead. This may happen if the program was interrupted while writing the index chunk.
Note
When parsing data files, beware:
The C/C++ function fseek()
operates with long
offset values.
While this tends to be not an issue on Linux, on Windows it will limit seeking
to 32-bit offset values. This will cause trouble with large files on Windows.
To fix this, you may need to use platform-dependent seek functions with 64-bit file offsets.
Extrinsics calibration
The YAML file calibration/extrinsics.yaml
contains the extrinsics calibration of the cameras
as calibrated by the user. The devices
node within this file contains a list of calibrated devices.
Each device is identified by its serial number (serialNumber
node).
The following calibration information is given for each camera device:
global_t_colorCamera
: Translation component of theglobal_tr_colorCamera
transformation.global_q_colorCamera
: Rotation component of theglobal_tr_colorCamera
transformation.colorCamera_t_depthCamera
: Translation component of thecolorCamera_tr_depthCamera
transformation.colorCamera_q_depthCamera
: Rotation component of thecolorCamera_tr_depthCamera
transformation.depthScaleFactor
: A correction factor that should be applied to all depth values in the device’s depth images.
Each translation component is given as a 3-vector. Each rotation component is given as a unit quaternion with the following order of components: x, y, z, w. Quaternions are interpreted as in the Eigen library.
global_tr_colorCamera
is the transformation that, if a point is right-multiplied with it, transforms the point from the color camera’s local coordinate system to the global coordinate system.colorCamera_tr_depthCamera
is the transformation that, if a point is right-multiplied with it, transforms the point from the depth camera’s local coordinate system to the color camera’s local coordinate system.
In a camera’s local coordinate system, +x is right, +y is down, and +z is forward. The origin lies at the camera’s projection center.
In the global coordinate system, +x is left, +y is up, and +z is forward. The origin lies on the floor, at the center of the recording area.
Intrinsics calibration and image size
The image size and the camera intrinsics, as retrieved from the Azure Kinect or Femto Bolt devices, are available within the header text of their data files.
The image size is given in the nodes custom
/ camera_sensor
/ width
and custom
/ camera_sensor
/ height
.
For Azure Kinect devices, the intrinsics are in the node
custom
/additional_info
/k4a_intrinsics_parameters
.For Femto Bolt devices, the intrinsics are in the node
custom
/additional_info
/orbbec_intrinsics_parameters
.
Both device types use the Brown-Conrady / OpenCV camera model, though the Azure Kinect comes with three additional parameters which are not needed for using the calibration. The intrinsics parameters are given in a string, separated by spaces.
For the Azure Kinect, the 15 parameters are given in the same order as documented in the Azure Kinect Sensor SDK.
For the Femto Bolt, the 12 parameters are given in the following order:
intrinsic.fx
intrinsic.fy
intrinsic.cx
intrinsic.cy
distortion.k1
distortion.k2
distortion.k3
distortion.k4
distortion.k5
distortion.k6
distortion.p1
distortion.p2
Matching camera images
When a user starts recording a dataset, different devices might start recording at slightly different points in time. Thus, do not rely on images with the same record index being recorded at the same time! Furthermore, timestamps of images will not match exactly, even if they have been recorded simultaneously by two synchronized cameras.
Thus, images from different cameras should be matched such as to minimize the variance among their timestamps.
Interpreting depth values
Depth values are stored as unsigned 16-bit integers, in millimeters.
To convert a depth value to meters and apply the calibration, multiply it with depthScaleFactor / 1000
.
Here, depthScaleFactor
is obtained from the camera’s extrinsics calibration.
Furthermore, depending on your application, you might want to add the depthBias
value afterwards
(obtained from the dataset’s configuration.yaml
). This is the user-configured bias to counteract
Time-of-flight inaccuracies on surfaces such as human skin.