Homepage of Aditya Krishna Menon

Code for Cross-Modal Retrieval: A Pairwise Classification Approach, SDM 2015

The aim of this MATLAB code is to replicate the tables of results and figures from the paper Cross-Modal Retrieval: A Pairwise Classification Approach, appearing in SDM 2015.

The code comprises a main driver script, main.m, and several additional files organised into the following subfolders:

data/: MAT files for the datasets used in the paper.
helper/: miscellaneous helper scripts, to do with creating train-test splits, constructing the cross-product of various parameters for tuning, et cetera.
learners/: implementations of all the learning methods used in the paper, such as CCA, the cross-product approach, and so on.
libraries/: certain third-party code, such as for optimisation. (Details are provided below.)
plotting/: scripts for plotting PR curves.
results//: saved optimal parameters for each method on each of the datasets.
retrieval/: code for performing and assessing retrieval on a given dataset. (Details provided below.)

Example usage

To replicate the table of results from the paper, simply unzip the code, and run:


	>> main;

The display window will then fill with the results of training each of the methods on each of the datasets, using previously saved optimal parameters. Sample output:


[[Pascal]] 

performing svd...done [1.7209 secs] 

 

=== [ Random ] === 

MAP = 0.0652 	 0.0652 

 

=== [ Linear regression ] === 

MAP = 0.1317 	 0.1069 

 

=== [ CCA ] === 

MAP = 0.1681 	 0.1422 

 

=== [ CMML ] === 

MAP = 0.1802 	 0.1431 

 

...

Additionally, the script will produce PR curves for all three datasets.

In order to replicate the results of the case study, you must first download the full image and text of the Wikipedia dataset from here. (This file is ~1.4 GB.) Once this is done, unzip the contents in the libraries subfolder: this should create a new folder called wikipedia_dataset. At this stage, you may simply run:


case_study;

This script performs 4 image-to-text and text-to-image queries. It displays figures showing the corresponding query and result images for both regimes, for both the cross-product and CCA methods. The filenames of the corresponding query and result texts are reported in the console.

Detailed descriptions

The provided code builds upon that provided by the project pages for the MMM '10 paper of Rasiwasia et al. (reference [15] in our paper), and the CVPR '12 paper of Costa Pereira and Vasconcelos (reference [13] in our paper). In detail, the following subfolders are taken from these sources:

data/: this mirrors the contents of the data/ subfolder of ris_cvpr12.zip
retrieval/xmodal_acmm10/: this mirrors the contents of the code/ subfolder of xmodal_acmm10.zip, with one change: in the file retrieval.m, we have modified the call to calc_distance to call instead generalised_calc_distance. This is to allow for usage of the dot-product "metric" for retrieval.

In addition, we rely on the following external libraries, copies of which are provided in the libraries/ subfolder:

minFunc: this is used to perform L-BFGS minimisation of several objective functions.
liblinear-3.94: this is used to train logistic regression for the SCM method.

As noted above, the script main.m used previously saved optimal parameters. Should you wish to search for these optimal parameters, simply change line 40 to read


	TUNE = 1;