Reproject Disparity Images to 3D Point Clouds

Disparity images can be converted to 3D Point Clouds using the factory camera calibration from the MultiSense using the Q matrix.

The Q matrix can be assembled from the stereo calibration’s P Matrix. The below equation represents a general definition for a Q Matrix for non-isotropic pixels.

\[ \begin{align}\begin{aligned}\begin{split}Q = \begin{bmatrix} f_yT_x & 0 & 0 & -f_yc_xT_x \\ 0 & f_xT_x & 0 & -f_xc_yT_x \\ 0 & 0 & 0 & f_xf_yT_x \\ 0 & 0 & -f_y & f_y(c_x-c'_x) \end{bmatrix}\end{split}\\ \text{}\\f_x = \text{x focal length from the left P matrix}\\f_y = \text{y focal length from the left P matrix}\\c_x = \text{The x central pixel from the left P matrix}\\c_y = \text{The y central pixel from the left P matrix}\\T_x = \text{The x baseline from the left P matrix}\\c'_x = \text{The x central pixel from the right P matrix}\end{aligned}\end{align} \]

When the rectified focal lengths for \(f_x\) and \(f_y\) are equal (\(f_x = f_y = f\)) the Q matrix can be simplified to the following representation by dividing each element in the non-isotropic representation by \(f * T_x\)

\[\begin{split}Q = \begin{bmatrix} 1 & 0 & 0 & -c_x \\ 0 & 1 & 0 & -c_y \\ 0 & 0 & 0 & f \\ 0 & 0 & -\frac{1}{T_x} & \frac{(c_x-c'_x)}{Tx} \\ \end{bmatrix}\end{split}\]

The Q matrix can be used directly to convert each pixel of the disparity image to a corresponding 3D point in the left rectified camera’s coordinate frame. This can be done using the following expression

\[ \begin{align}\begin{aligned}\begin{split}\begin{bmatrix} x * \beta \\ y * \beta \\ z * \beta \\ \beta \end{bmatrix} = Q * \begin{bmatrix} u \\ v \\ d \\ 1 \end{bmatrix} \text{}\end{split}\\x = \text{x location of the 3D point in meters}\\y = \text{y location of the 3D point in meters}\\z = \text{z location of the 3D point in meters}\\\beta = \text{arbitrary scale factor which should be removed with normalization}\\u = \text{width index of the corresponding disparity pixel}\\v = \text{height index of the corresponding disparity pixel}\\d = \text{scaled floating point disparity value in pixels}\end{aligned}\end{align} \]

Note

Disparity images transmitted from the MultiSense are quantized to 1/16th of a pixel and stored as unsigned 16bit integers. Before converting a disparity image to a 3D point cloud, the disparity image need to be converted into a floating point pixel representation.

C++

// assume we have our disparity image stored in a standard C++ container in row-major order
std::vector<uint16_t> disparity = {...};

// assume this is loaded from the extrinsics calibration
Eigen::Matrix4d Q = {...};

std::vector<Eigen::Vector3d> points;

for (size_t v = 0 ; v < height ; ++v)
{
    for (size_t u = 0 ; u < width ; ++u)
    {
        const uint16_t raw_disparity = disparity[v * width + u];

        // Skip invalid disparity values
        if (raw_disparity == 0)
        {
            continue;
        }

        const double d = static_cast<double>(raw_disparity) / 16.0;

        const Eigen::Vector3d point = (Q * Eigen::Vector4d{static_cast<double>(u), static_cast<double>(v), d, 1.0}).hnormalized();
        points.emplace_back(point);
    }
}

Python

# assume we have our disparity image stored in a numpy array of type np.uint16 in row-major order
disparity = np.array([...])

# assume this is loaded from the extrinsics calibration
Q = np.matrix([...])

points = []

for v in range(height):
    for u in range(width):
        d = disparity[v * width + u]
        if d == 0:
            continue
        scaled_point = Q @ np.array([float(u), float(v), float(d) / 16.0, 1.0])
        point = scaled_point[3]
        points.append(point[0:3])

The OpenCV function cv::reprojectImageTo3d can also be used to convert Disparity images to 3D points. This is done using the 4x4 perspective transformation Q matrix.

C++

// multisense_disparity is a unsigned 16bit disparity image directly transmitted from the MultiSense. It's assumed
// this is loaded into the program as a cv::Mat with a datatype of CV_16UC1. reprojectImageTo3D takes
// floating point disparity images as input, so ensure we convert to the floating point pixel representation for
// the disparity before calling reprojectImageTo3D
cv::Mat multisense_disparity = ...;

cv::Mat disparity;
multisense_disparity.convertTo(disparity, CV_32FC1, 1.0/16.0);

cv::Mat point_cloud;
cv::reprojectImageTo3D(disparity, point_cloud, Q);

Python

# multisense_disparity is a unsigned 16bit disparity image directly transmitted from the MultiSense. It's assumed
# this is loaded into the program as a numpy array with a datatype of np.uint16. reprojectImageTo3D takes
# floating point disparity images as input, so ensure we convert to the floating point pixel representation for
# the disparity before calling reprojectImageTo3D
multisense_disparity = np.array([...])

disparity = multisense_disparity.astype(np.float) * (1.0 / 16.0)

point_cloud = cv2.reprojectImageTo3D(disparity, Q)

Point Clouds with Intensity

The disparity image maps 1-1 with the left luma rectified image. The PointCloudUtility provides and example of how to use the left rectified image to overlay intensity data from the left luma rectified image data onto a 3D point cloud without using any 3rd party dependencies like OpenCV.

Convert Disparity Images to Depth Images

A disparity image can be converted to a depth image using the relationship between the MultiSense camera’s baseline and rectified focal length.

The following depth formula can be evaluated for every disparity pixel using \(f_x\) and \(T_x\) from the stereo calibration’s extrinsic P Matrix.

\[ \begin{align}\begin{aligned}baseline = -T_x\\depth = \frac{f_x * baseline}{disparity}\end{aligned}\end{align} \]

Note

Disparity images from the MultiSense are transmitted quantized to 1/16th of a pixel and stored with 16bit integers. Before converting a disparity image to a depth image, the disparity image need to be converted into a floating point pixel representation

This operation is equivalent to taking only the z values from a reprojected 3D point cloud

\[z = \frac {f_x * f_y * T_x}{-f_y * d}\]

Warning

Invalid disparity values of 0 should be ignored when converting a disparity image to a depth image. Depth for disparity values of 0 is undefined.

Create Color Images

The chroma image contains the CbCr channels of the planar YCbCr420 compression scheme. A RGB color image can be created using the corresponding luma image.

The OpenCV function cv::cvtColorTwoPlane can be used to convert corresponding luma and chroma images to BGR images:

C++

// Convert from YUV to BGR color space
cv::Mat bgr;
cv::cvtColorTwoPlane(luma, chroma, bgr, cv::COLOR_YUV2BGR_NV12);

Python

# Convert from YUV to BGR color space
bgr = cv2.cvtColorTwoPlane(luma, chroma, cv2.COLOR_YUV2BGR_NV12)

You can reference the LibMultiSense ColorImageUtility for an example of how to convert a luma and chroma image pair into a RGB color image without any 3rd party dependencies.

Create a Color 3D Point Cloud

Each MultiSense’s main left/right stereo pair consists of two mono camera. This configuration was chosen to optimize stereo performance over a variety of environmental conditions. Since many applications require color images, certain models of Stereo Cameras (MultiSense S27, MultiSense S30) include a third wide angle aux color camera. All 3 camera have their intrinsics and extrinsics calibrated which allows for 3D points computed from the main stereo disparity can be colorized using the data from the color aux camera.

To colorize a point cloud computed from the MultiSense disparity image, each 3D point must be projected into the aux rectified color image using the aux extrinsic projection matrix.

\[\begin{split}\begin{bmatrix} \beta * u_{aux} \\ \beta * v_{aux} \\ \beta \end{bmatrix} = P_{aux} * \begin{bmatrix} x \\ y \\ z \end{bmatrix}\end{split}\]

The LibMultiSense DepthImageUtility provides a dependency-free example of how to project 3D points computed from the disparity image into the aux rectified image for colorization.

Approximation For Execution Speed

In practice both \(T_y\) and \(T_z\) translations in the aux camera extrinsic calibration are very small (less than 1mm). By assuming both values are zero, the corresponding aux color pixel can be determined using the ratio of the aux stereo baseline and the primary stereo baseline.

\[ \begin{align}\begin{aligned}u_{aux} = u_{left} - \frac{T_{x_{aux}}}{T_{x_{right}}} * disparity\\v_{aux} = v_{left}\end{aligned}\end{align} \]

Decode a Compressed Image to RGB

The following sections provide examples of how to convert compressed I-Frames streamed via compressed image topics into RGB images.

Decode Compressed Image Streams Examples

Rectify Images Using OpenCV

Note

S27, S30, and KS21 cameras support onboard rectification of all image streams including aux camera sources. It’s recommended you use these rectified streams instead of rectifying with a userspace application

Raw distorted images from the MultiSense can be unrectified using the stereo calibration parameters stored on the camera. The following examples use built-in OpenCV functions to generate rectification lookup tables used to remap raw distorted images to rectified images.

Note

The fx, fy, cx, and cy members in the M and P calibration matrices will need to be scaled based on the camera’s operating resolution before generating OpenCV rectification lookup tables