Zero

  • Home
  • Download
  • Detail
  • About

Download

Download Pre-training datasets

Pre-training datasets

  • 2.3M dataset [Google Drive] [Baidu Drive]
  • 23M dataset [Google Drive] [Baidu Drive]

Texts of Downstream datasets

  • ICM [Google Drive] [Baidu Drive]
  • IQM [Google Drive] [Baidu Drive]
  • ICR [Google Drive] [Baidu Drive]
  • IQR [Google Drive] [Baidu Drive]
  • Flickr30k-CNA [Google Drive] [Baidu Drive] We provide the re-translated high-quality texts for Flickr30k.

Images of Downstream datasets

We use the same images for ICM, IQM, ICR and IQM. The URL of the images are as follows:

  • IQR_IQM_ICR_ICM_images_base64_package_0.tar.gz [Google Drive] [Baidu Drive]
  • IQR_IQM_ICR_ICM_images_base64_package_1.tar.gz [Google Drive] [Baidu Drive]
  • IQR_IQM_ICR_ICM_images_base64_package_2.tar.gz [Google Drive] [Baidu Drive]
  • IQR_IQM_ICR_ICM_images_base64_package_3.tar.gz [Google Drive] [Baidu Drive]
  • IQR_IQM_ICR_ICM_images_base64_package_4.tar.gz [Google Drive] [Baidu Drive]
Especially, we encode all images using base64. To decode the image, you can use following code:

def decoder_base64(orig_path, save_path):
    '''
     orig_path: /input_base64/t012a0d0bef441bf584
     save_path: /output_image/t012a0d0bef441bf584.jpg
     '''
     with open(orig_path, "rb") as f:
         base64_data = f.read()
         with open(save_path, "wb") as f:
             img = base64.b64decode(base64_data)
             f.write(img)

For Flickr30k-CNA, Please download the corresponding images from Flickr30k