Stereo Images#
Info
The iFDO standard is supporting stereo image rectification since version 2.2.0.
Prerequisites
This guide expects the reader to have a basic understanding of the structure of an iFDO and the parameters needed for stereo image rectification.
Three types of metadata are needed to perform stereo image rectification:
- Which images belong together
- The camera calibration parameters for both cameras
- The translation vector and rotation matrix from one camera to the other
Grouping Images#
While the iFDO standard does not offer a structure to group images of the two cameras forming a stereo camera together, there are implicit mechanisms to achieve this goal.
As the subimages of a stereo image are taken at the same time, it is recommended to use the image-datetime field to group the subimages together.
Depending on the dataset, it might also be possible to do this via the filename or the image-sensor field.
Tools, however, should not rely on these, as the image-sensor field is not required in the image-set-items section and the filename format depends heavily on the dataset author.
Warning
It should never be assumed that image entries can be grouped together based on the order of entries in the iFDO. This order might not be preserved by the software loading the iFDO!
Example
In this example, subimages can be grouped together via the image-datetime and image-sensor fields and the image filename.
```` { .json .copy }
"image-set-items": {
"SVL_Remos3_2024.10.01_01.04.46_L.jpg": {
"image-datetime": "2024-10-01 01:04:46.000",
"image-sensor": {"name": "station:svluwobs:svluw2:remos1-L"},
},
"SVL_Remos3_2024.10.01_01.04.46_R.jpg": {
"image-datetime": "2024-10-01 01:04:46.000",
"image-sensor": {"name": "station:svluwobs:svluw2:remos1-R"}
}
}
````
Camera Calibration Parameters#
The camera calibration parameters needed for stereo image rectification include the distortion parameters \(D\) and the intrinsic camera matrix.
Both are implicitly defined in the field image-camera-calibration-model:
While \(D\) is defined in the subfield calibration-distortion-coefficients, \(K\) is split between calibration-focal-length-xy-pixel, storing the focal lengths \(f_x\) and \(f_y\), and calibration-principal-point-xy-pixel, storing the center pixels \(c_x\) and \(c_y\).
Example
"image-set-items": {
"SVL_Remos3_2024.10.01_01.04.46_L.jpg": {
"image-datetime": "2024-10-01 01:04:46.000",
"image-camera-calibration-model": {
"calibration-model-type": "rectilinear air",
"calibration-focal-length-xy-pixel": [5120.0678, 5092.0016],
"calibration-principal-point-xy-pixel": [2058.1510, 1265.1542],
"calibration-distortion-coefficients": [0.287715, 0.034034, -0.00256, 0.01276, 0.0]
},
},
"SVL_Remos3_2024.10.01_01.04.46_R.jpg": {
"image-datetime": "2024-10-01 01:04:46.000",
"image-camera-calibration-model": {
"calibration-model-type": "rectilinear air",
"calibration-focal-length-xy-pixel": [4138.8075, 3898.9478],
"calibration-principal-point-xy-pixel": [2014.6881, 1427.0768],
"calibration-distortion-coefficients": [0.22561, 1.14582, -0.04321, 0.01484, 0.0]
},
}
}
Relative Translation and Rotation#
Finally, the relative translation and rotation between the camera coordinates have to be defined. This is done via the image-stereo-camera-calibration-model field, which stores the rotation matrix (in row-major form) and transposed translation vector from the camera coordinates of the image to a reference coordinate system chosen by the author. This chosen coordinate system must be the same for both subimages of a stereo image to allow for the retrieval of the transformation between the camera coordinates.
- The rotation matrix for converting the camera coordinates of camera A to camera B can then be calculated with \(R_A \cdot R_B^T\)
- The translation vector for converting the camera coordinates of camera A to camera B can then be calculated with \(T_A - T_B\)
\(R_{A/B}\) and \(T_{A/B}\) must be stored in the relative-orientation-matrix and relative-translation fields of every image entry of camera A/B.
Example
In most cases the calibration software will produce a transformation from one camera to the other. In this example, the calibration yields a transformation from the right camera to the left.
Defining the left camera’s coordinates as the reference camera coordinates greatly simplifies the definition of the image-stereo-camera-calibration-model: As the left camera’s coordinates and the reference coordinates are the same, the relative-orientation-matrix for the left camera becomes the identity matrix and the calibrated rotation matrix for the right camera. The relative-translation becomes the zero vector for the left camera and the calibrated translation vector for the right camera.
```` { .json .copy }
"image-set-items": {
"SVL_Remos3_2024.10.01_01.04.46_L.jpg": {
"image-datetime": "2024-10-01 01:04:46.000",
"image-camera-calibration-model": {...},
"image-stereo-camera-calibration-model": {
"relative-orientation-matrix": [1, 0, 0, 0, 1, 0, 0, 0, 1],
"relative-translation": [0, 0, 0]
}
},
"SVL_Remos3_2024.10.01_01.04.46_R.jpg": {
"image-datetime": "2024-10-01 01:04:46.000",
"image-camera-calibration-model": {...},
"image-stereo-camera-calibration-model": {
"relative-orientation-matrix": [0.999971, -0.008383, -0.003264, 0.008338, 0.999871, 0.014546, 0.003144, -0.015472, 0.999791],
"relative-translation": [-269.88, -2.6599, -3.565]
}
}
}
````
Note
This roundabout way of saving the relative transformation between camera coordinates is necessary, since the iFDO standard has no direct mechanism to define properties between images or image groups. Using a shared reference like explained above mitigates this “shortcoming” of the JSON format.
Full Example#
Combining the steps above for two stereo images results in the following iFDO image-set-items section:
"image-set-items": {
"SVL_Remos3_2024.10.01_01.04.46_L.jpg": {
"image-datetime": "2024-10-01 01:04:46.000",
"image-sensor": {"name": "station:svluwobs:svluw2:remos1-L"},
"image-camera-calibration-model": {
"calibration-model-type": "rectilinear air",
"calibration-focal-length-xy-pixel": [5120.0678, 5092.0016],
"calibration-principal-point-xy-pixel": [2058.1510, 1265.1542],
"calibration-distortion-coefficients": [0.287715, 0.034034, -0.00256, 0.01276, 0.0]
},
"image-stereo-camera-calibration-model": {
"relative-orientation-matrix": [1, 0, 0, 0, 1, 0, 0, 0, 1],
"relative-translation": [0, 0, 0]
}
},
"SVL_Remos3_2024.10.01_01.04.46_R.jpg": {
"image-datetime": "2024-10-01 01:04:46.000",
"image-sensor": {"name": "station:svluwobs:svluw2:remos1-R"},
"image-camera-calibration-model": {
"calibration-model-type": "rectilinear air",
"calibration-focal-length-xy-pixel": [4138.8075, 3898.9478],
"calibration-principal-point-xy-pixel": [2014.6881, 1427.0768],
"calibration-distortion-coefficients": [0.22561, 1.14582, -0.04321, 0.01484, 0.0]
},
"image-stereo-camera-calibration-model": {
"relative-orientation-matrix": [0.999971, -0.008383, -0.003264, 0.008338, 0.999871, 0.014546, 0.003144, -0.015472, 0.999791],
"relative-translation": [-269.88, -2.6599, -3.565]
}
}
"SVL_Remos3_2024.10.01_01.05.46_L.jpg": {
"image-datetime": "2024-10-01 01:05:46.000",
"image-sensor": {"name": "station:svluwobs:svluw2:remos1-L"},
"image-camera-calibration-model": {
"calibration-model-type": "rectilinear air",
"calibration-focal-length-xy-pixel": [5120.0678, 5092.0016],
"calibration-principal-point-xy-pixel": [2058.1510, 1265.1542],
"calibration-distortion-coefficients": [0.287715, 0.034034, -0.00256, 0.01276, 0.0]
},
"image-stereo-camera-calibration-model": {
"relative-orientation-matrix": [1, 0, 0, 0, 1, 0, 0, 0, 1],
"relative-translation": [0, 0, 0]
}
},
"SVL_Remos3_2024.10.01_01.05.46_R.jpg": {
"image-datetime": "2024-10-01 01:05:46.000",
"image-sensor": {"name": "station:svluwobs:svluw2:remos1-R"},
"image-camera-calibration-model": {
"calibration-model-type": "rectilinear air",
"calibration-focal-length-xy-pixel": [4138.8075, 3898.9478],
"calibration-principal-point-xy-pixel": [2014.6881, 1427.0768],
"calibration-distortion-coefficients": [0.22561, 1.14582, -0.04321, 0.01484, 0.0]
},
"image-stereo-camera-calibration-model": {
"relative-orientation-matrix": [0.999971, -0.008383, -0.003264, 0.008338, 0.999871, 0.014546, 0.003144, -0.015472, 0.999791],
"relative-translation": [-269.88, -2.6599, -3.565]
}
}
}