Trailer12k Dataset

Trailer12k Collage

Overview

Trailers12k is a movie trailer dataset comprised of 12,000 titles associated to ten genres. It distinguishes from other datasets by its collection procedure aimed at providing a high-quality publicly available dataset. The following table compares Trailers12k to other similar datasets.

Trailer12k Comparison Table

A detailed explanation of the collection procedure, statistics and computing process of trailer representations can be found in the paper Improving Transfer Learning for Movie Trailer Genre Classification using a Dual Image and Video Transformer.

This dataset is an updated and polished version of Trailers15k. In addition to manually-curated movie trailers, Trailers12k provides Kinetics video clip-level representations, ImageNet poster representations and rich metadata.

Data

All data is available on Zenodo and summarized in following table:

Content	Files
IMDb metadata & Youtube ids	`metadata.json`
Trailer representations	`trailers_i_shufflenet_fpc24.zarr` `trailers_i_resnet_fpc24.zarr` `trailers_i_swin_fpc24.zarr` `trailers_k_shufflenet_fps24_fpc24.zarr` `trailers_k_r2plus1d_fps24_fpc24.zarr` `trailers_ik_swin_fps24_fpc24.zarr`
Poster representations	`posters_i_swin.zarr`
MTGC evaluation splits	`mtgc.csv`

Trailer Representations

The following table describes trailer image (frame-level) and video (clip-level) representations:

	ImageNet-1k	Kinetics-400	Backbone	File
Frame-level	✔		2D ShuffleNet-V2-1x	`trailers_i_shufflenet_fpc24.zarr`
	✔		ResNet50	`trailers_i_resnet_fpc24.zarr`
	✔		2D Tiny Swin-Transformer	`trailers_i_swin_fpc24.zarr`
Clip-level		✔	3D ShuffleNet-V2-1x	`trailers_k_shufflenet_fps24_fpc24.zarr`
		✔	ResNet (2+1)D	`trailers_k_r2plus1d_fps24_fpc24.zarr`
	✔	✔	3D Tiny Swin-Transformer	`trailers_ik_swin_fps24_fpc24.zarr`

Representations are stored with zarr. After extraction, a trailer can be loaded with the following code:

import zarr
z = zarr.open('trailers_ik_swin_fps24_fpc24.zarr', mode='r')
num_features = z.attrs['num_features']
arr = z['tt0347149']
arr.shape, arr.dtype # ((96, 768), dtype('float32'))

You can see a full implementation of a Pytorch Dataset in the DIViTA repo.

MTGC Evaluation

trailers12k_mtgc.csv provides a stratified three-fold evaluation split for the multi-label genre classification task with the following columns.

mid: movie identifier.
action, adventure, comedy, crime, drama, fantasy, horror romance, sci-fi & thriller: genres as binary labels.
split0, split1 & split2: subset to which the movie belongs in the split (0: training, 1: validation and 2:test).

The following snippet loads movie ids and genres of the validation subset in the third split:

import pandas as pd
df = pd.read_csv('mtgc.csv')
df = df[df['split2'] == 1]
df = df.iloc[:, 0:11]

License

Creative Commons Attribution Non Commercial Share Alike 4.0 International CC BY-NC-SA.

Citing

If you find this work useful in your research, please consider citing.

@article{Trailers12k-2023103343,
title = {Improving Transfer Learning for Movie Trailer Genre Classification using a Dual Image and Video Transformer},
journal = {Information Processing & Management},
volume = {60},
number = {3},
pages = {103343},
year = {2023},
issn = {0306-4573},
doi = {https://doi.org/10.1016/j.ipm.2023.103343},
url = {https://www.sciencedirect.com/science/article/pii/S0306457323000808},
author = {Ricardo Montalvo-Lezama and Berenice Montalvo-Lezama and Gibran Fuentes-Pineda},
keywords = {Multi-label classification, Transfer learning, Trailers12k, Spatio-temporal analysis, Video analysis, Transformer model},
}

People

Please feel free to contact us if you have questions.