We organize the datasets as follows.
For each JSON in the pre-training dataset, we give the meaning of each attribute.
- "Url" is the URL link of the original website.
- "Title" is the core content of the image from the website.
- "Content" is a detailed description surrounding the image on the web page.
- "Keywords" is the keywords when searching on a search engine.
- "ImageKey" is the unique attribute for each sample.
- "ImageUrl" represents the URL link to the image.
- "ImageQuery" is assigned to the search query according to the user's behavior.
For each JSON in ICM and IQM, we provide the meaning of each attribute.
- "text" is the corresponding textual description.
- "image_path" is the relative path with the unique attribute.
- "label" represents whether a pair of image-text matches
For each JSON in ICR and IQR, we give the meaning of each attribute.
- "text" is the corresponding textual description.
- "image_path" is the relative path with the unique attribute.
In Flickr30k-CNA, we give re-translated Chinese texts as follows.
6228559981 在城市中,一个面带微笑的小孩跑过从地面喷出的喷泉,人们看着他在水中奔跑。 6228559981 一个穿着绿色t恤的黑发男孩在一个喷泉的水里玩,另外两个人在看着他。 6228559981 一个身穿彩色t恤的小男孩正穿过位于河边的街道喷泉。 6228559981 一个孩子跑过喷泉。 6228559981 一个男孩在水上公园玩。 6244908705 一个穿着黄色衬衫、戴着头盔的男人正沿着布满灰尘的小路骑着自行车。 6244908705 一名身穿黄色衬衫、头戴黑色头盔的男子正骑着自行车穿过一条小路。 6244908705 一个戴着黑色头盔的男人,正骑着一辆山地自行车。 6244908705 一个男人正在一条小路上骑山地自行车。 6244908705 那个男人正沿着小路骑自行车。