This post is the second article in the series that covers FieldXR app, an extension for Salesforce Field Service with deep visualization functionality. We have already raised the issue of collecting 3D information about the real world required for the app’s correct operation. In this article, we will highlight the next two pain points: identifying the scanned object and calculating the position of the found 3D model for displaying in the scanned object’s coordinates.

* 1 Pic. The scanned object*

*2 Pic. Model (red color) displayed in the coordinates of the scanned object(blue color)*

## Introduction

First, let's define the terminology.

**OBJ-file**is the file received as a result of scanning by the Structure Sensor. Stores the 3D object geometry.**The Fragment**is the scanned object in real-time (also, it can include elements of the scene that fall into the cube of the scanning area).

**A Model**is a 3D object stored in a database.**The Database**is model storage.**Rendering**is the process of generating images from a 3D model in the app.

## Challenges

While undertaking identifying the 3D object you need to review:

- What is given and what data should be prepared?
- How to compare two 3D objects?
- How to calculate the coordinates of the displacement and Model’s rotation about the Fragment?

## Solution

### 1. Input data

As a result of scanning the object with the Structure sensor, we received a file with the OBJ extension. It stores the geometry of the 3D object. The values are represented in local 3D coordinates.

*3 Pic. Sample of OBJ-file stored 3D-object*

How can I identify a Fragment with only knowledge of its geometry? It is necessary to have some kind of descriptive database consisting of prepared Models. The comparison algorithm returns the most similar Model for the Fragment under consideration.

### 2. Preparing the Models database

At this stage, a set of dices was scanned. The result is the following collection of different 3D models:

*4 Pic. 3D Models*

Further, using Blender, we removed unnecessary elements.

*5 Pic. 3D Model(before editing)*

*6 Pic. 3D Model (after editing)*

### 3. Algorithm for matching the Fragment and the Model

The field of computer vision provides a considerable number of algorithms for matching 3D objects. However, based on the fact that the Fragment is scanned by the user at a certain angle, and the scan result is a 2D projection of the Fragment, it was decided to use the OpenCV algorithm for searching a fragment in the image.

This algorithm works as follows.

It receives a Model file and a Fragment file as input and calculates keypoints for each image. Next, for every set of keypoints, calculates the corresponding detector. For each detector, the algorithm finds **n-best matches**. The obtained array of points undergo additional review and then a homography (perspective transformation between two arrays of points) is calculated. The algorithm returns the coordinates of the bounding 4-gon vertices of the entire Fragmnet’s projection (the vertices are connected by green lines, calculated given the perspective transformation of points) on the Model’s projection.

As a result of the comparison, the key points are displayed (the matching pairs are connected by colored lines).

*7 Pic. Good Matches Projection_Model.png&Projection_Fragment.png*

The algorithm qualifies for 2D, how to convert it to 3D?

Let’s suppose, we have a set of N-projections of the Model stored in a database, and a 2D projection of the Fragment. Using this search algorithm, we go through a set of Model’s N-projections, compare each with the Fragment’s projection, determine the best result, and return the name of this Model’s projection.

*8 Pic. Good Matches Projection_Model.png&Projection_Fragment.png*

Let's add the generated 2D projections to the model database.

How do I create 2D projections of a 3D object? OpenGL tools come to the rescue.

The mechanism functions as follows: the 3D object is located on the scene (local coordinates are converted to world coordinates), the camera is adjusted (initial position, lens, orientation). Next, the object projection is created (Model coordinates are converted to camera coordinates and then to on-screen 2D coordinates).

*9 Pic. The process of creating a 2D projection*

Then, using quaternions, we rotate the Model (the angle and rotation vector are set by the user). Generated thereby Model’s projections are saved to the database.

Given the fact that the problem of calculating the Model displacement and rotation about the Fragment has to be solved, it is proper to save the data about the appropriate transformation (angle and rotation vector) in the name of every projection file. At this point, the database is ready and the algorithm is selected.

It remains to automate the algorithm for N-projections. The difficulty of selecting a criterion for determining the best matching result arises at the stage. The algorithm returns the best matches of key points, as well as the 4-gon coordinates, which may degenerate in the worst case. After thinking about the problem, we have made the following assumptions (their veracity was tested experimentally):

- The greater the number of points from the best matches set, the greater the similarity between the two projections;
- The boundary of a 4-gon tends to a rectangle;
- The larger the area of the bounding 4-gon, the greater the similarity between the two projections;
- If none of the previous assumptions turns out to be correct: testing and implementation of probability theory/function analysis methods to find similarities between 2D objects by key points, for example, finding the average error.

The analysis demonstrated that the 1st statement is true: the larger the number of key points, the more similarities the Fragment’s projection has with the Model’s projection.

*10 Pic. Summary table comparing the N-projections of the Model with the projection of the Fragment.*

### 4. Calculation of Model position coordinates in screen coordinates

At this stage, we have the following data:

- OBJ Model and Fragment files;
- PNG file of the Fragment projection;
- PNG file of the best Model projection(with rotation parameters in the file name);
- Coordinates of the 4-gon vertices restrict the projection of the Model on the Fragment’s projection.

The additional offset calculating is needed to provide a correct Model overlay on the Fragment.

Let's return to the process of creating a 2D projection for a 3D object (Pic. The process of creating a 2D projection). If you use the inverse transformation of screen coordinates to 3D coordinates, you can calculate the required offset. For doing it, use the OpenGL unproject () function, as well as the saved transformation matrices.

The function returns the 3D coordinates (x, y, z). However, the value of the z coordinate (depth) will not be correct.

*15 Pic. Visualization of the transformation of a point P(x,y,z) to P’(x,y)*

The z-coordinate can take the value of any point lying on the OP ray.

However, we have information about the x and y offset, as well as the Model array of 3D local coordinates and Fragment from OBJ files. Therefore, using the transformation matrix for the Model (by turning it by the angle calculated earlier) and applying the offset in the x, y coordinates, we will be able to find the appropriate entries in the coordinates array of Fragment vertices. And then it is easy to find the difference between the z-coordinates of certain array’s elements. Now, we know the offset by the coordinates (x, y, z).

## Result

To check the algorithm validity for calculating the rotation and displacement of the Model, we can render the Fragment OBJ files and the Model. The result is shown below.

*16 Pic. Rendering Model.obj(red color) & Fragment.obj(dark blue color) before transformation Model.obj*

*17 Pic. Rendering Model.obj(red color) & Fragment.obj(dark blue color) after transformation Model.obj*

## Summary

An important point in the development process is the acceleration of the algorithm. The study provided 3 main possibilities to reduce the time of devising comparison algorithm:

- reducing the resolution of png files of projections (reducing the resolution by 2 times, gives an acceleration of the algorithm execution by almost 2 times);
- reducing the number of key points of 2 projection images (configurable in the algorithm code: OpenCV function parameters (cv::xfeatures2d::SURF::create)). You can reduce the execution time of the algorithm by almost 3 times, however, the main thing is to maintain a balance between reducing key points and determining the best result;
- to use multithreading. At this stage, the research is ongoing.

In this article, we discussed the algorithm of identifying the scanned object and calculating a found 3D model position for displaying in the scanned object’s coordinates. The algorithm is implemented as a DLL that is connected to the server-side of the project. In the next article, we will talk about the client part implementation of the project, which will provide 3D data of the scanned fragment and send this data to the server for identifying the 3D model.

## References

OpenCV: Features2D + Homography to find a known object;

cv::xfeatures2d::SURF::create.

### About author

Volha Dzeranchuk - developer C++. Education: Bachelor degree (year of graduation: 2012), BSU, Faculty of Mechanics, and Mathematics. Experience in the IT field for more than 8 years. Development of GUI applications for Windows, creation of DLLs, project support (C++, UE4, SQL/QT, Visual Studio/VR). More than 3 years of experience in teaching