The Zero benchmark is entirely open and freely accessible.

WARNING:This large-scale benchmark is built for research purposes only to enable large-scale model training for a broad range of researchers and other interested communities, and is not suitable for any real-world production or application.


Zero, a large-scale Chinese cross-modal benchmark, contains two pre-training datasets called Zero-Corpus and five downstream datasets.

Pre-training datasets

  • 23 million dataset (Zero-Corpus). Zero-Corpus is collected from the search engine and contains images and corresponding textual descriptions, which is filtered from 5 billion image-text pairs by user click-through rate.
  • 2.3 million dataset (Zero-Corpus-Sub). A sub-dataset of Zero-Corpus. Training VLP models on Zero-Corpus may demand overwhelming GPU resources, thus a sub-dataset with 10% image-text pairs is also provided for research purpose.

Downstream datasets

  • ICM It is curated for the image-text matching task. It contains 400,000 image-text pairs, including 200,000 positive cases and 200,000 negative cases.
  • IQM It is a dataset also for the image-text matching task. Different from ICM, we use the search query instead of detailed description text. Similarly, IQM contains 200,000 positive cases and 200,000 negative cases.
  • ICR We collect 200,000 image-text pairs. It contains image-to-text retrieval and text-to-image retrieval tasks.
  • IQR IQR is also proposed for the image-text retrieval task. We randomly select 200,000 queries and the corresponding images as the annotated image-query pairs similar to IQM.
  • Flickr30k-CNA We gather professional English and Chinese linguists to meticulously re-translate all data of Flickr30k and double-check each sentence. Beijing Magic Data Technology Co., Ltd. contributes for the translation of this dataset.


     title = {Zero and R2D2: A Large-scale Chinese Cross-modal Benchmark and A Vision-Language Framework},
     author = {Xie, Chunyu and Cai, Heng and Song, Jianfei and Li, Jincheng and Kong, Fanjing and Wu, Xiaoyu
                   and Morimitsu, Henrique and Yao, Lin and Wang, Dexin and Zhang, Xiangzheng and Leng, Dawei and Ji, Xiangyang and Deng, Yafeng},
     journal = {arXiv preprint arXiv:2205.03860},
     year = {2022}