Reproject Disparity Images to 3D Point Clouds
Disparity images can be converted to 3D Point Clouds using the factory camera calibration from the MultiSense using the Q matrix.
The Q matrix can be assembled from the stereo calibration’s P Matrix. The below equation represents a general definition for a Q Matrix for non-isotropic pixels.
When the rectified focal lengths for \(f_x\) and \(f_y\) are equal (\(f_x = f_y = f\)) the Q matrix can be simplified to the following representation by dividing each element in the non-isotropic representation by \(f * T_x\)
The Q matrix can be used directly to convert each pixel of the disparity image to a corresponding 3D point in the left rectified camera’s coordinate frame. This can be done using the following expression
Note
For most applications \((c_x - c'_x)\) will be zero since the \(c_x\) from the left P matrix will match the \(c'_x\) value from the right P matrix
Note
Disparity images transmitted from the MultiSense are quantized to 1/16th of a pixel and stored as unsigned 16bit integers. Before converting a disparity image to a 3D point cloud, the disparity image need to be converted into a floating point pixel representation.
C++
Eigen
This example was developed with Eigen v3.4.0
// assume we have our disparity image stored in a standard C++ container in row-major order
std::vector<uint16_t> disparity = {...};
// assume this is loaded from the extrinsics calibration
const double fx = ...;
const double fy = ...;
const double cx = ...;
const double cy = ...;
const double tx = ...;
const double cx_prime = ...;
Eigen::Matrix4d Q;
Q << fy*tx, 0.0, 0.0, -fy*cx*tx,
0.0, fx*tx, 0.0, -fx*cy*tx,
0.0, 0.0, 0.0, fx*fy*tx,
0.0, 0.0, -fy, fy*(cx - cx_prime);
std::vector<Eigen::Vector3d> points;
for (size_t v = 0 ; v < height ; ++v)
{
for (size_t u = 0 ; u < width ; ++u)
{
const uint16_t raw_disparity = disparity[v * width + u];
// Skip invalid disparity values
if (raw_disparity == 0)
{
continue;
}
const double d = static_cast<double>(raw_disparity) / 16.0;
const Eigen::Vector3d point = (Q * Eigen::Vector4d{static_cast<double>(u), static_cast<double>(v), d, 1.0}).hnormalized();
points.emplace_back(point);
}
}
Python
import numpy as np
# disparity_image is a np.ndarray of dtype np.uint16 arranged row-major with shape (height, width)
disparity_image = np.array([...], dtype=np.uint16)
height, width = disparity_image.shape
# Convert quantized disparities (1/16 pixel increments) to floating point pixels
disparity = disparity_image.astype(np.float32) / 16.0
# Build the 4x4 Q matrix from calibration
Q = np.array([...], dtype=np.float64)
points = [] # list of XYZ vectors in meters
for v in range(height):
for u in range(width):
d = disparity[v, u]
if d <= 0.0:
continue # 0 indicates invalid disparity
homogeneous = Q @ np.array([float(u), float(v), float(d), 1.0], dtype=np.float64)
xyz = homogeneous[:3] / homogeneous[3]
points.append(xyz)
LibMultiSense Utilities
The C++ PointCloudUtility or Python PointCloudUtility can be used as a reference for how to reproject points without using an external dependency on a matrix library. They leverage the MultiSense utility function create_color_pointcloud to perform the conversion.
OpenCV
The OpenCV function cv::reprojectImageTo3d can also be used to convert Disparity images to 3D points. This is done using the 4x4 perspective transformation Q matrix.
OpenCV C++
// multisense_disparity is a unsigned 16bit disparity image directly transmitted from the MultiSense. It's assumed
// this is loaded into the program as a cv::Mat with a datatype of CV_16UC1. reprojectImageTo3D takes
// floating point disparity images as input, so ensure we convert to the floating point pixel representation for
// the disparity before calling reprojectImageTo3D
cv::Mat multisense_disparity = ...;
cv::Mat disparity;
multisense_disparity.convertTo(disparity, CV_32FC1, 1.0/16.0);
cv::Mat point_cloud;
cv::reprojectImageTo3D(disparity, point_cloud, Q);
OpenCV Python
import cv2
import numpy as np
# multisense_disparity is a unsigned 16bit disparity image directly transmitted from the MultiSense
multisense_disparity = np.array([...], dtype=np.uint16)
Q = np.array([...], dtype=np.float64)
# reprojectImageTo3D expects float32 disparity values in pixel units
disparity = multisense_disparity.astype(np.float32) * (1.0 / 16.0)
point_cloud = cv2.reprojectImageTo3D(disparity, Q)
Point Clouds with Intensity
The disparity image maps 1-1 with the left luma rectified image. The LibMultiSense Utilities support creating point clouds colorized with intensity.
Convert Disparity Images to Depth Images
A disparity image can be converted to a depth image using the relationship between the MultiSense camera’s baseline and rectified focal length.
The following depth formula can be evaluated for every disparity pixel using \(f_x\) and \(T_x\) from the stereo calibration’s extrinsic P Matrix.
Note
Disparity images from the MultiSense are transmitted quantized to 1/16th of a pixel value and stored with 16bit integers. Before converting a disparity image to a depth image, the disparity image need to be converted into a floating point pixel representation
This operation is equivalent to taking only the z values from a reprojected 3D point cloud
\[z = \frac {f_x * f_y * T_x}{-f_y * d}\]
Warning
Invalid disparity values of 0 should be ignored when converting a disparity image to a depth image. Depth for disparity values of 0 is undefined.
The C++ SaveImageUtility or Python SaveImageUtility can be used as a reference for how to create and save depth images points without using an external dependencies. They leverage the LibMultiSense utility function create_depth_image to perform the conversion.
Create Color Images
The chroma image contains the CbCr channels of the planar YCbCr420 compression scheme. A RGB color image can be created using the corresponding luma image.
The following C++ code applies the conversion gains to a single YCbCr pixel:
const float red = std::clamp(luma + 1.13983f * (cr - 128.0f), 0.0f, 255.0f);
const float green = std::clamp(luma - 0.39465f * (cb - 128.0f) - 0.58060f * (cr - 128.0f), 0.0f, 255.0f);
const float blue = std::clamp(luma + 2.03211f * (cb - 128.0f), 0.0f, 255.0f);
The LibMultiSense Utility function create_bgr_from_ycbcr420 can be used to convert corresponding luma and chroma images to BGR images.
Create a Color 3D Point Cloud
Each MultiSense’s main left/right stereo pair consists of two mono cameras. This configuration was chosen to optimize stereo performance over a variety of environmental conditions. Since many applications require color images, certain models of Stereo Cameras (MultiSense S27, MultiSense S30) include a third wide angle aux color camera. All three cameras have their intrinsics and extrinsics calibrated, which allows 3D points computed from the main stereo disparity to be colorized using the data from the color aux camera.
To colorize a point cloud computed from the MultiSense disparity image, each 3D point must be projected into the aux rectified color image using the aux extrinsic projection matrix.
Points that fall outside of the aux image bounds or lie behind the aux camera cannot be colorized and should be discarded or tagged appropriately. An aux pixel may correspond to multiple 3D points along a ray, so the simple nearest-point mapping can be applied to resolve small parallax differences.
Approximation For Execution Speed
In practice both \(T_y\) and \(T_z\) translations in the aux camera extrinsic calibration are very small (less than 1mm). By assuming both values are zero, the corresponding aux color pixel can be determined using the ratio of the aux stereo baseline and the primary stereo baseline.
The LibMultiSense C++ PointCloudUtility or Python PointCloudUtility provides a dependency-free example of how to project create 3D points clouds using the aux image for colorization using the approximation for speed.
Create Depth Image in the Aux Camera Frame
A common workflow is to run an object detector on the aux color images, and find the associated 3D point or depth for detected objects. This requires the MultiSense 3D data be transformed into the aux camera frame. Since the mapping between aux pixels and disparity pixels is dependent on the position of the object in 3D, it’s often easier to transform the entire disparity image into a depth image in the aux camera frame, and then query the associated depths for the detection pixels.
It’s easiest to do this using the colorization speed approximation. Each disparity pixel can be converted to either a depth or 3D point using the 3D point generation routine, and a new image or buffer can be saved with the point at the pixel location \((u_{aux}, v_{aux})\) defined here.
Note
This approach only works for rectified aux images. If your application requires 3D data for detected objects in the unrectified aux camera frame, you should compute a 3D point cloud from the associated disparity data, and project each point into the unrectified aux camera using the camera’s intrinsics and extrinsics.
LibMultiSense Utilities
The C++ SaveImageUtility or Python SaveImageUtility can be used as a reference for how to create and depth images in the aux camera frame without using an external dependencies. They leverage the LibMultiSense utility function create_depth_image with the argument to perform the transformation and conversion.
Decode a Compressed Image to RGB
The following sections provide examples of how to convert compressed I-Frames streamed via compressed image topics into RGB images.
C++
The following example decoder uses FFmpeg to decode I-frames from MultiSense compressed streams. For convenience, OpenCV is used for the display of images.
#include <iostream>
#include <memory>
#include <stdexcept>
extern "C"
{
#include <libavcodec/avcodec.h>
#include <libswscale/swscale.h>
}
#include <opencv2/highgui.hpp>
class Decoder
{
public:
Decoder(AVCodecID codec, AVPixelFormat pixel_format)
: m_pixel_format(pixel_format),
m_bytes_per_pixel(pixel_format == AV_PIX_FMT_GRAY8 ? 1 : 3)
{
m_codec = avcodec_find_decoder(codec);
if (!m_codec)
{
throw std::runtime_error("Decoder: unable to find codec");
}
m_context = avcodec_alloc_context3(m_codec);
if (!m_context)
{
throw std::runtime_error("Decoder: unable to allocate context");
}
if (avcodec_open2(m_context, m_codec, nullptr) < 0)
{
throw std::runtime_error("Decoder: unable to open codec");
}
m_frame = av_frame_alloc();
if (!m_frame)
{
throw std::runtime_error("Decoder: unable to allocate frame");
}
av_init_packet(&m_packet);
}
~Decoder()
{
avcodec_free_context(&m_context);
av_frame_free(&m_frame);
av_packet_unref(&m_packet);
if (m_sws_context)
{
sws_freeContext(m_sws_context);
}
delete[] m_packed_image_data;
}
cv::Mat decode(uint8_t *data, size_t size)
{
m_packet.data = data;
m_packet.size = static_cast<int>(size);
if (avcodec_send_packet(m_context, &m_packet) < 0)
{
std::cerr << "Decoder: error sending packet" << std::endl;
av_init_packet(&m_packet);
return {};
}
if (avcodec_receive_frame(m_context, m_frame) < 0)
{
std::cerr << "Decoder: invalid frame" << std::endl;
av_init_packet(&m_packet);
return {};
}
if (!m_sws_context)
{
m_sws_context = sws_getContext(m_frame->width,
m_frame->height,
static_cast<AVPixelFormat>(m_frame->format),
m_frame->width,
m_frame->height,
m_pixel_format,
0,
nullptr,
nullptr,
nullptr);
if (!m_sws_context)
{
std::cerr << "Decoder: unable to initialize sws context" << std::endl;
return {};
}
m_packed_image_data = new uint8_t[m_frame->width * m_frame->height * m_bytes_per_pixel];
}
// Convert encoded frame to the desired pixel format
const int output_stride = m_frame->width * static_cast<int>(m_bytes_per_pixel);
uint8_t *dst[] = {m_packed_image_data};
int dst_stride[] = {output_stride};
sws_scale(m_sws_context,
m_frame->data,
m_frame->linesize,
0,
m_frame->height,
dst,
dst_stride);
av_init_packet(&m_packet);
const cv::Mat image(m_frame->height,
m_frame->width,
m_bytes_per_pixel == 3 ? CV_8UC3 : CV_8UC1,
m_packed_image_data);
return image.clone(); // return an owning copy
}
private:
AVCodec *m_codec = nullptr;
AVCodecContext *m_context = nullptr;
AVFrame *m_frame = nullptr;
AVPacket m_packet;
SwsContext *m_sws_context = nullptr;
AVPixelFormat m_pixel_format = AV_PIX_FMT_BGR24;
uint8_t *m_packed_image_data = nullptr;
size_t m_bytes_per_pixel = 3;
};
int main()
{
// Create one decoder per stream being decoded
auto color_decoder = std::make_unique<Decoder>(AV_CODEC_ID_H264, AV_PIX_FMT_BGR24);
auto grayscale_decoder = std::make_unique<Decoder>(AV_CODEC_ID_H264, AV_PIX_FMT_GRAY8);
// Acquire payloads from the MultiSense callback or a log file
uint8_t *color_image_data_ptr = ...;
size_t color_data_length = ...;
uint8_t *grayscale_image_data_ptr = ...;
size_t grayscale_data_length = ...;
cv::Mat color_image = color_decoder->decode(color_image_data_ptr, color_data_length);
cv::Mat grayscale_image = grayscale_decoder->decode(grayscale_image_data_ptr, grayscale_data_length);
if (!color_image.empty())
{
cv::imshow("color", color_image);
}
if (!grayscale_image.empty())
{
cv::imshow("grayscale", grayscale_image);
}
cv::waitKey(1);
return 0;
}
Python
Install pythons av package on your system
pip install av
Decode the data transmitted in the Image Header’s imageDataP payload using pyav. The input
data buffer can be read from a log file or from disk as a python array.
import av
data = ...
codec = av.CodecContext.create("h264", "r")
packets = av.packet.Packet(data)
image = codec.decode(packets)[0].to_image()
Rectify Images Using OpenCV
Note
S27, S30, and KS21 cameras support onboard rectification of all image streams including aux camera sources. It’s recommended you use these rectified streams instead of rectifying with a userspace application
Raw distorted images from the MultiSense can be unrectified using the stereo calibration parameters stored on the camera. The following examples use built-in OpenCV functions to generate rectification lookup tables used to remap raw distorted images to rectified images.
Note
The fx, fy, cx, and cy members in the M and P calibration matrices will need to be scaled based on the camera’s operating resolution before generating OpenCV rectification lookup tables
C++
#include <opencv2/opencv.hpp>
// Use the 3x3 M projection matrix from the intrinsic calibration
cv::Mat M = ...;
// Use D distortion from the intrinsic calibration (5 or 8 parameters in the standard OpenCV ordering)
cv::Mat D = ...;
// Use the 3x3 R rotation matrix from the extrinsic calibration
cv::Mat R = ...;
// Use the 3x4 P camera projection matrix from the extrinsic calibration
cv::Mat P = ...;
// Width of the output rectified image
int width = ...;
// Height of the output rectified image
int height = ...;
// Output lookup tables for the x and y output image dimensions
cv::Mat remap_x;
cv::Mat remap_y;
cv::initUndistortRectifyMap(M, D, R, P, cv::Size(width, height), CV_32FC1, remap_x, remap_y)
cv::Mat unrectified_image = cv::imread("unrectified_image.png);
cv::Mat rectified_image
// Interpolate using the remap lookup tables using bi-linear interpolation
cv::remap(unrectified_image, rectified_image, remap_x, remap_y, cv::INTER_LINEAR);
cv::imwrite("rectified.png", rectified_image);
Python
import cv2
import numpy as np
// Use the 3x3 M projection matrix from the intrinsic calibration
M = np.array([...]);
// Use D distortion from the intrinsic calibration (5 or 8 parameters in the standard OpenCV ordering)
D = np.array([...]);
// Use the 3x3 R rotation matrix from the extrinsic calibration
R = np.array([...]);
// Use the 3x4 P camera projection matrix from the extrinsic calibration
P = np.array([...]);
// Width of the output rectified image
width = ...;
// Height of the output rectified image
height = ...;
unrectified_image = cv2.imread('unrectified_image.png')
map_x, map_y = cv2.initUndistortRectifyMap(M, D, R, P, (width, height))
# Apply the transformation to the image
rectified_image = cv2.remap(unrectified_image, map_x, map_y, cv2.INTER_LINEAR)
cv2.imwrite('rectified.png', rectified_image)