Download
Download Pre-training datasets
Pre-training datasets
- 2.3M dataset [Google Drive] [Baidu Drive]
- 23M dataset [Google Drive] [Baidu Drive]
Texts of Downstream datasets
- ICM [Google Drive] [Baidu Drive]
- IQM [Google Drive] [Baidu Drive]
- ICR [Google Drive] [Baidu Drive]
- IQR [Google Drive] [Baidu Drive]
- Flickr30k-CNA [Google Drive] [Baidu Drive] We provide the re-translated high-quality texts for Flickr30k.
Images of Downstream datasets
We use the same images for ICM, IQM, ICR and IQM. The URL of the images are as follows:
- IQR_IQM_ICR_ICM_images_base64_package_0.tar.gz [Google Drive] [Baidu Drive]
- IQR_IQM_ICR_ICM_images_base64_package_1.tar.gz [Google Drive] [Baidu Drive]
- IQR_IQM_ICR_ICM_images_base64_package_2.tar.gz [Google Drive] [Baidu Drive]
- IQR_IQM_ICR_ICM_images_base64_package_3.tar.gz [Google Drive] [Baidu Drive]
- IQR_IQM_ICR_ICM_images_base64_package_4.tar.gz [Google Drive] [Baidu Drive]
def decoder_base64(orig_path, save_path):
'''
orig_path: /input_base64/t012a0d0bef441bf584
save_path: /output_image/t012a0d0bef441bf584.jpg
'''
with open(orig_path, "rb") as f:
base64_data = f.read()
with open(save_path, "wb") as f:
img = base64.b64decode(base64_data)
f.write(img)
For Flickr30k-CNA, Please download the corresponding images from Flickr30k