Reproject Disparity Images to 3D Point Clouds

Disparity images can be converted to 3D Point Clouds using the factory camera calibration from the MultiSense using the Q matrix.

The Q matrix can be assembled from the stereo calibration’s P Matrix. The below equation represents a general definition for a Q Matrix for non-isotropic pixels.

\[ \begin{align}\begin{aligned}\begin{split}Q = \begin{bmatrix} f_yT_x & 0 & 0 & -f_yc_xT_x \\ 0 & f_xT_x & 0 & -f_xc_yT_x \\ 0 & 0 & 0 & f_xf_yT_x \\ 0 & 0 & -f_y & f_y(c_x-c'_x) \end{bmatrix}\end{split}\\ \text{}\\f_x = \text{x focal length from the left P matrix}\\f_y = \text{y focal length from the left P matrix}\\c_x = \text{The x central pixel from the left P matrix}\\c_y = \text{The y central pixel from the left P matrix}\\T_x = \text{The x baseline from the right P matrix}\\c'_x = \text{The x central pixel from the right P matrix}\end{aligned}\end{align} \]

When the rectified focal lengths for \(f_x\) and \(f_y\) are equal (\(f_x = f_y = f\)) the Q matrix can be simplified to the following representation by dividing each element in the non-isotropic representation by \(f * T_x\)

\[\begin{split}Q = \begin{bmatrix} 1 & 0 & 0 & -c_x \\ 0 & 1 & 0 & -c_y \\ 0 & 0 & 0 & f \\ 0 & 0 & -\frac{1}{T_x} & \frac{(c_x-c'_x)}{Tx} \\ \end{bmatrix}\end{split}\]

The Q matrix can be used directly to convert each pixel of the disparity image to a corresponding 3D point in the left rectified camera’s coordinate frame. This can be done using the following expression

Note

For most applications \((c_x - c'_x)\) will be zero since the \(c_x\) from the left P matrix will match the \(c'_x\) value from the right P matrix

\[ \begin{align}\begin{aligned}\begin{split}\begin{bmatrix} x * \beta \\ y * \beta \\ z * \beta \\ \beta \end{bmatrix} = Q * \begin{bmatrix} u \\ v \\ d \\ 1 \end{bmatrix} \text{}\end{split}\\x = \text{x location of the 3D point in meters}\\y = \text{y location of the 3D point in meters}\\z = \text{z location of the 3D point in meters}\\\beta = \text{arbitrary scale factor which should be removed with normalization}\\u = \text{width index of the corresponding disparity pixel}\\v = \text{height index of the corresponding disparity pixel}\\d = \text{scaled floating point disparity value in pixels}\end{aligned}\end{align} \]

Note

Disparity images transmitted from the MultiSense are quantized to 1/16th of a pixel and stored as unsigned 16bit integers. Before converting a disparity image to a 3D point cloud, the disparity image need to be converted into a floating point pixel representation.

C++

Eigen

This example was developed with Eigen v3.4.0

// assume we have our disparity image stored in a standard C++ container in row-major order
std::vector<uint16_t> disparity = {...};

// assume this is loaded from the extrinsics calibration
const double fx = ...;
const double fy = ...;
const double cx = ...;
const double cy = ...;
const double tx = ...;
const double cx_prime = ...;
Eigen::Matrix4d Q;
Q << fy*tx, 0.0, 0.0, -fy*cx*tx,
     0.0, fx*tx, 0.0, -fx*cy*tx,
     0.0, 0.0, 0.0, fx*fy*tx,
     0.0, 0.0, -fy, fy*(cx - cx_prime);

std::vector<Eigen::Vector3d> points;

for (size_t v = 0 ; v < height ; ++v)
{
    for (size_t u = 0 ; u < width ; ++u)
    {
        const uint16_t raw_disparity = disparity[v * width + u];

        // Skip invalid disparity values
        if (raw_disparity == 0)
        {
            continue;
        }

        const double d = static_cast<double>(raw_disparity) / 16.0;

        const Eigen::Vector3d point = (Q * Eigen::Vector4d{static_cast<double>(u), static_cast<double>(v), d, 1.0}).hnormalized();
        points.emplace_back(point);
    }
}

No External Dependencies

The PointCloudUtility can be used as a reference for how to reproject points without using an external dependency on a matrix library

Python

# assume we have our disparity image stored in a numpy array of type np.uint16 in row-major order
disparity = np.array([...])

# assume this is loaded from the extrinsics calibration
Q = np.matrix([...])

points = []

for v in range(height):
    for u in range(width):
        d = disparity[v * width + u]
        if d == 0:
            continue
        scaled_point = Q @ np.array([float(u), float(v), float(d) / 16.0, 1.0])
        point = scaled_point[3]
        points.append(point[0:3])

The OpenCV function cv::reprojectImageTo3d can also be used to convert Disparity images to 3D points. This is done using the 4x4 perspective transformation Q matrix.

OpenCV C++

// multisense_disparity is a unsigned 16bit disparity image directly transmitted from the MultiSense. It's assumed
// this is loaded into the program as a cv::Mat with a datatype of CV_16UC1. reprojectImageTo3D takes
// floating point disparity images as input, so ensure we convert to the floating point pixel representation for
// the disparity before calling reprojectImageTo3D
cv::Mat multisense_disparity = ...;

cv::Mat disparity;
multisense_disparity.convertTo(disparity, CV_32FC1, 1.0/16.0);

cv::Mat point_cloud;
cv::reprojectImageTo3D(disparity, point_cloud, Q);

OpenCV Python

# multisense_disparity is a unsigned 16bit disparity image directly transmitted from the MultiSense. It's assumed
# this is loaded into the program as a numpy array with a datatype of np.uint16. reprojectImageTo3D takes
# floating point disparity images as input, so ensure we convert to the floating point pixel representation for
# the disparity before calling reprojectImageTo3D
multisense_disparity = np.array([...])

disparity = multisense_disparity.astype(np.float) * (1.0 / 16.0)

point_cloud = cv2.reprojectImageTo3D(disparity, Q)

Point Clouds with Intensity

The disparity image maps 1-1 with the left luma rectified image. The PointCloudUtility provides and example of how to use the left rectified image to overlay intensity data from the left luma rectified image data onto a 3D point cloud without using any 3rd party dependencies like OpenCV.

Convert Disparity Images to Depth Images

A disparity image can be converted to a depth image using the relationship between the MultiSense camera’s baseline and rectified focal length.

The following depth formula can be evaluated for every disparity pixel using \(f_x\) and \(T_x\) from the stereo calibration’s extrinsic P Matrix.

\[ \begin{align}\begin{aligned}baseline = -T_x\\depth = \frac{f_x * baseline}{disparity}\end{aligned}\end{align} \]

Note

Disparity images from the MultiSense are transmitted quantized to 1/16th of a pixel value and stored with 16bit integers. Before converting a disparity image to a depth image, the disparity image need to be converted into a floating point pixel representation

This operation is equivalent to taking only the z values from a reprojected 3D point cloud

\[z = \frac {f_x * f_y * T_x}{-f_y * d}\]

Warning

Invalid disparity values of 0 should be ignored when converting a disparity image to a depth image. Depth for disparity values of 0 is undefined.

Create Color Images

The chroma image contains the CbCr channels of the planar YCbCr420 compression scheme. A RGB color image can be created using the corresponding luma image.

The OpenCV function cv::cvtColorTwoPlane can be used to convert corresponding luma and chroma images to BGR images:

C++

// Convert from YUV to BGR color space
cv::Mat bgr;
cv::cvtColorTwoPlane(luma, chroma, bgr, cv::COLOR_YUV2BGR_NV12);

Python

# Convert from YUV to BGR color space
bgr = cv2.cvtColorTwoPlane(luma, chroma, cv2.COLOR_YUV2BGR_NV12)

You can reference the LibMultiSense SaveImageUtility for an example of how to convert a luma and chroma image pair into a RGB color image without any 3rd party dependencies.

Create a Color 3D Point Cloud

Each MultiSense’s main left/right stereo pair consists of two mono camera. This configuration was chosen to optimize stereo performance over a variety of environmental conditions. Since many applications require color images, certain models of Stereo Cameras (MultiSense S27, MultiSense S30) include a third wide angle aux color camera. All 3 camera have their intrinsics and extrinsics calibrated which allows for 3D points computed from the main stereo disparity can be colorized using the data from the color aux camera.

To colorize a point cloud computed from the MultiSense disparity image, each 3D point must be projected into the aux rectified color image using the aux extrinsic projection matrix.

\[\begin{split}\begin{bmatrix} \beta * u_{aux} \\ \beta * v_{aux} \\ \beta \end{bmatrix} = P_{aux} * \begin{bmatrix} x \\ y \\ z \end{bmatrix}\end{split}\]

The LibMultiSense SaveImageUtility provides a dependency-free example of how to project 3D points computed from the disparity image into the aux rectified image for colorization.

Approximation For Execution Speed

In practice both \(T_y\) and \(T_z\) translations in the aux camera extrinsic calibration are very small (less than 1mm). By assuming both values are zero, the corresponding aux color pixel can be determined using the ratio of the aux stereo baseline and the primary stereo baseline.

\[ \begin{align}\begin{aligned}u_{aux} = u_{left} - \frac{T_{x_{aux}}}{T_{x_{right}}} * disparity\\v_{aux} = v_{left}\end{aligned}\end{align} \]

Decode a Compressed Image to RGB

The following sections provide examples of how to convert compressed I-Frames streamed via compressed image topics into RGB images.

C++

The following example decoder uses ffmpeg to decode iframes from MultiSense compressed streams. For convenience, OpenCV is used for the display of images

#include <iostream>

extern "C"
{
    #include <libavcodec/avcodec.h>
    #include <libswscale/swscale.h>
}

#include <opencv2/highgui.hpp>

class Decoder
{
public:

    Decoder(const AVCodecID &codec, const AVPixelFormat &pixel_format):
        m_pixel_format(pixel_format),
        m_bytes_per_pixel(pixel_format == AV_PIX_FMT_GRAY8 ? 1 : 3)
    {
        m_codec = avcodec_find_decoder(codec);
        if (!codec)
        {
            throw std::runtime_error("Decoder: unable to find codec");
        }

        m_context = avcodec_alloc_context3(m_codec);
        if(!m_context)
        {
            throw std::runtime_error("Decoder: unable to allocate context");
        }

        if (avcodec_open2(m_context, m_codec, nullptr) < 0)
        {
            throw std::runtime_error("Decoder: unable to open codec");
        }

        m_frame = av_frame_alloc();
        if(!m_frame)
        {
            throw std::runtime_error("Decoder: unable to allocate frame");
        }

        av_init_packet(&m_packet);
    }

    ~Decoder()
    {
        avcodec_free_context(&m_context);
        av_frame_free(&m_frame);
        av_packet_unref(&m_packet);

        if (m_sws_context)
        {
            sws_freeContext(m_sws_context);
        }

        if (m_packed_image_data)
        {
            delete[] m_packed_image_data;
        }
    }

    cv::Mat decode(uint8_t *data, size_t size)
    {
        m_packet.data = data;
        m_packet.size = size;

        if (avcodec_send_packet(m_context, &m_packet) < 0)
        {
            std::cerr << "Decoder: error sending packet" << std::endl;
            av_init_packet(&m_packet);
            return {};
        }

        if (avcodec_receive_frame(m_context, m_frame) < 0)
        {
            std::cerr << "Decoder: invalid frame" << std::endl;
        av_init_packet(&m_packet);
        return {};
    }

    if (!m_sws_context)
    {
        m_sws_context = sws_getContext(m_frame->width,
                                       m_frame->height,
                                       static_cast<AVPixelFormat>(m_frame->format),
                                       m_frame->width,
                                       m_frame->height,
                                       m_pixel_format,
                                       0,
                                       nullptr,
                                       nullptr,
                                       nullptr);

        if (!m_sws_context)
        {
            std::cerr << "Decoder: unable to initialize sws context" << std::endl;
            return {};
        }

        m_packed_image_data = new uint8_t[m_frame->width * m_frame->height * m_bytes_per_pixel];
    }

    //
    // Convert encoded frame to desired output format
    int output_stride = m_frame->width * m_bytes_per_pixel;

    sws_scale(m_sws_context,
              m_frame->data,
              m_frame->linesize,
              0,
              m_frame->height,
              &m_packed_image_data,
              &output_stride);

    av_init_packet(&m_packet);

    const cv::Mat image(m_frame->height, m_frame->width, m_bytes_per_pixel == 3 ? CV_8UC3 : CV_8UC1, m_packed_image_data);

    return image.clone();
};

private:
    AVCodec *m_codec = nullptr;
    AVCodecContext *m_context= nullptr;
    AVFrame *m_frame = nullptr;
    AVPacket m_packet;
    SwsContext *m_sws_context = nullptr;
    AVPixelFormat m_pixel_format = AV_PIX_FMT_BGR24;
    uint8_t *m_packed_image_data = nullptr;
    size_t m_bytes_per_pixel = 3;
};

int main()
{
    //
    // Note a unique decoder needs to be constructed for each MultiSense image stream which is being decoded
    auto color_decoder = std::make_unique<Decoder>(AV_CODEC_ID_H264, AV_PIX_FMT_BGR24);
    auto grayscale_decoder std::make_unique<Decoder>(AV_CODEC_ID_H264, AV_PIX_FMT_GRAY8);

    //
    // Initialize and capture data from the MultiSense, or read data in from file
    ...

    //
    // If data is is being received in realtime from a MultiSense, the following code should be moved
    // to a image callback
    const auto color_image = color_decoder->decode(color_image_data_ptr, color_data_length);
    const auto grayscale_image = grayscale_decoder->decode(grayscale_image_data_ptr, grayscale_data_length);

    cv::imshow("color", color_image);
    cv::waitKey(1);

    cv::imshow("grayscale", grayscale_image);
    cv::waitKey(1);

    return 0;
}

Python

Install pythons av package on your system

pip install av

Decode the data transmitted in the Image Header’s imageDataP payload using pyav. The input data buffer can be read from a log file or from disk as a python array.

import av

data = ...

codec = av.CodecContext.create("h264", "r")
packets = av.packet.Packet(data)
image = codec.decode(packets)[0].to_image()

Rectify Images Using OpenCV

Note

S27, S30, and KS21 cameras support onboard rectification of all image streams including aux camera sources. It’s recommended you use these rectified streams instead of rectifying with a userspace application

Raw distorted images from the MultiSense can be unrectified using the stereo calibration parameters stored on the camera. The following examples use built-in OpenCV functions to generate rectification lookup tables used to remap raw distorted images to rectified images.

Note

The fx, fy, cx, and cy members in the M and P calibration matrices will need to be scaled based on the camera’s operating resolution before generating OpenCV rectification lookup tables

C++

#include <opencv2/opencv.hpp>

// Use the 3x3 M projection matrix from the intrinsic calibration
cv::Mat M = ...;

// Use D distortion from the intrinsic calibration (5 or 8 parameters in the standard OpenCV ordering)
cv::Mat D = ...;

// Use the 3x3 R rotation matrix from the extrinsic calibration
cv::Mat R = ...;

// Use the 3x4 P camera projection matrix from the extrinsic calibration
cv::Mat P = ...;

// Width of the output rectified image
int width = ...;

// Height of the output rectified image
int height = ...;

// Output lookup tables for the x and y output image dimensions
cv::Mat remap_x;
cv::Mat remap_y;

cv::initUndistortRectifyMap(M, D, R, P, cv::Size(width, height), CV_32FC1, remap_x, remap_y)

cv::Mat unrectified_image = cv::imread("unrectified_image.png);

cv::Mat rectified_image

// Interpolate using the remap lookup tables using bi-linear interpolation
cv::remap(unrectified_image, rectified_image, remap_x, remap_y, cv::INTER_LINEAR);

cv::imwrite("rectified.png", rectified_image);

Python

import cv2
import numpy as np

// Use the 3x3 M projection matrix from the intrinsic calibration
M = np.array([...]);

// Use D distortion from the intrinsic calibration (5 or 8 parameters in the standard OpenCV ordering)
D = np.array([...]);

// Use the 3x3 R rotation matrix from the extrinsic calibration
R = np.array([...]);

// Use the 3x4 P camera projection matrix from the extrinsic calibration
P = np.array([...]);

// Width of the output rectified image
width = ...;

// Height of the output rectified image
height = ...;

unrectified_image = cv2.imread('unrectified_image.png')

map_x, map_y = cv2.initUndistortRectifyMap(M, D, R, P, (width, height))

# Apply the transformation to the image
rectified_image = cv2.remap(unrectified_image, map_x, map_y, cv2.INTER_LINEAR)

cv2.imwrite('rectified.png', rectified_image)