ViDRILO: The Visual and Depth Robot Indoor Localization with Objects information dataset

Tools

The ViDRILO dataset is released in conjunction with a MATLAB toobox that provides with the following capabilities:

  • Generation and evaluation of multimodal semantic localization systems using sequences of the dataset for training/test
    • Descriptors generation from perspective images
      • GIST
      • Pyramid Histogram of Oriented Gradients (PHOG)
      • Greyscale Histogram
    • Descriptor generation from point clouds
      • Ensemble of Shape Function (ESF)
      • Depth Histogram
    • Learning and classification stages
      • Support Vector Machines
      • k-Nearest Neighbor (kNN)
      • Random Forest
    • Evaluation of the results: generation of graphics
      • Confusion matrix for room classification
      • Precision/recall for object recognition
      • ROC curves for object recognition
    • Evaluation of the results: metrics
      • Accuracy
      • Root Mean Squared Error (RMS)
      • True Positives and False Rates
      • Precision
      • Recall
      • F1-Score
      • Area Under the Curve (AUC)
      • Precision at recall levels for: 0.25, 0.50, 0.75 (object recognition)
  • Generation of dataset statistics
    • Room distribution
    • Objects distribution
    • Objects and Rooms relationships (P(Room=r|Object=o) and P(Object=o|Room=r) )
  • Point Cloud File visual representation (without any additional requirements like PCL library, openCV, etc) for dataset point cloud files.
  • Visualization for dataset frames
    • Visual and depth images
    • Grayscale and Depth histograms

The toolbox includes a complete guide of use with detailed information for its installation and use.

Toolbox Files

The ViDRILO toolbox has the following main files:

  • ConfigurationVidrilo.mat: Configuration file with the whole dataset annotations and paths to the sequences.
  • visualizePointCloud.m: RGB-D (visual image and point cloud file) visualization
  • showDatasetOverallStats.m: Dataset stats visualization.
  • evaluateExternalResults.m: Evaluates the room classification and object recognition results over a specific dataset sequence and stored in a external .csv file.
  • runVidriloClassifier.m: Room classification and object recognition for a combination of.
    • Training and test sequence.
    • Source of information: visual, depth or both combined
    • Type of classification model: SVM, kNN or RFs
    • Type of visual descriptor: GIST, PHOG or Greyscale Histogram
    • Type of depth descriptor: ESF or Depth Histogram

Thanks to the visualizePointCloud function, it is possible to generate MATLAB figures with the visual representation of the point cloud files. Two different types of visualizations are provided. The first type shows colour and depth information for a single frame and visualize some features extracted from them. Concretely, it shows the histogram extracted from each image.

The second type loads a point cloud file into a manipulable Matlab surface figure (see next figure). This figure allows to change the viewpoint in the scene. Despite the existence of more powerful alternatives (as the PCL viewer), the released pad visualizer does not require the installation of any additional software.

The showDatasetOverallStats function loads the configuration files and generates two graphs with the following information: the probability of finding a room once we have recognized an object and vice versa.

The runVidriloClassifier function performs the basic steps in both visual place classification and object recognition problems: descriptor generation, learning stage, classification, and evaluation of the results. As descriptor generation, we include five global descriptors (GIST, PHOG, ESF, Greyscale Histogram and Depth Histogram). The generated descriptors are then used as input for a classification model. We included three different classifiers: SVMs. kNN and RFs. Regarding rooms classification, we train a single multi-class classifier, while for each object we train a single binary classifier.The evaluation of the results computes different statistics for room classification and object recognition. Namely, it is generated a room confusion matrix for room decisions. With respect to object recognition, the toolbox generates a precision/recall graph and figure with the ROC curves for all the objects. Other metrics are also computed: accuracy, RMS error, precision, recall, AUC, and precision at recall levels.

As an example of the use of the Toolbox, the following three graphs below are obtained when calling runVidriloClassifier(2,1,'visual','knn','gist'). This involves the generation of a kNN classifier from the GIST features extracted from Sequence 2 and its evaluation against Sequence1.

With regards to the metrics obtained, the toolbox will generate the following lines:

## Only Visual Information
## Classification Model: kNN
## Visual Descriptor: GIST
### ROOMS CLASSIFICATION - DETAILED RESULTS BY ROOM ###

Room TP Rate FP Rate Precision Recall F1-Score ROC Area
CR 0.90473 0.28693 0.79920 0.90473 0.84870 0.80890
HA 0.04854 0.01794 0.10870 0.04854 0.06711 0.51530
PO 0.23387 0.00927 0.58000 0.23387 0.33333 0.61230
SO 0.66452 0.05640 0.44978 0.66452 0.53646 0.80406
TR 0.53676 0.01509 0.68224 0.53676 0.60082 0.76084
TO 0.55372 0.05026 0.37017 0.55372 0.44371 0.75173
SE 0.21429 0.00611 0.60000 0.21429 0.31579 0.60409
VC 0.27517 0.00893 0.67213 0.27517 0.39048 0.63312
WH 0.51429 0.02285 0.40449 0.51429 0.45283 0.74572
EA 0.39000 0.01879 0.47561 0.39000 0.42857 0.68561
W.Avg: 0.67811 0.17068 0.66579 0.67811 0.65374 0.75371

### ROOMS CLASSIFICATION - OVERALL RESULTS ###

### ROOMS: WELL CLASSIFIED: 1620.
### ROOMS: BAD CLASSIFIED: 769.

### Accuracy: 67.81.
### Root Mean Squared Error: 0.25373.

### OBJECT RECOGNITION - DETAILED RESULTS BY OBJECT ###
Object TP Rate FP Rate Precision Recall F1-Score ROC Area
Ben 0.29921 0.03747 0.48718 0.29921 0.37073 0.63087
Ext 0.50000 0.15513 0.35772 0.50000 0.41706 0.67243
Com 0.67340 0.04015 0.70423 0.67340 0.68847 0.81662
Tab 0.70615 0.04359 0.78481 0.70615 0.74341 0.83128
Cha 0.61741 0.03799 0.80902 0.61741 0.70034 0.78971
Boa 0.46269 0.12486 0.55578 0.46269 0.50498 0.66891
Pri 0.41143 0.02800 0.53731 0.41143 0.46602 0.69171
Boo 0.67687 0.04964 0.65677 0.67687 0.66667 0.81361
Uri 0.55556 0.03169 0.28846 0.55556 0.37975 0.76193
Sin 0.31579 0.00380 0.40000 0.31579 0.35294 0.65600
Han 0.00000 0.00000 0.00000 0.00000 0.00000 0.50000
Scr 0.15854 0.01040 0.35135 0.15854 0.21849 0.57407
Tra 0.26011 0.08901 0.47742 0.26011 0.33675 0.58555
Pho 0.08889 0.01000 0.25806 0.08889 0.13223 0.53944
Fri 0.66667 0.01329 0.55072 0.66667 0.60317 0.82669
W.Avg: 0.49219 0.06978 0.58309 0.49219 0.52470 0.71121

### OBJECT RECOGNITION - OVERALL RESULTS ###

### OBJECTS: TOTAL NUMBER OF OBJECTS WELL DETECTED: 1860 ###
### OBJECTS: TOTAL NUMBER OF OBJECTS BAD DETECTED: 1349 ###
### OBJECTS: TOTAL NUMBER OF OBJECTS NOT DETECTED: 1919 ###

### Root Mean Squared Error: 0.30199.
### Average Precision: 0.58.
### Average Recall: 0.49.
### Average F1 score: 0.53.
### Average Area Under Curve (ROC): 0.69.
### Average Precision at 0.25 Recall Level: 0.20.
### Average Precision at 0.50 Recall Level: 0.38.
### Average Precision at 0.75 Recall Level: 0.48.