Learning Transferable Visual Models From Natural Language Supervision

Published:

Recommended citation: Radford, Alec & Kim, Jong Wook, et al. "Learning Transferable Visual Models From Natural Language Supervision." (2021). https://cdn.openai.com/papers/Learning_Transferable_Visual_Models_From_Natural_Language_Supervision.pdf

Summary: State-of-the-art computer vision systems aretrained to predict a fixed set of predeterminedobject categories. This restricted form of super-vision limits their generality and usability sinceadditional labeled data is needed to specify anyother visual concept. Learning directly from rawtext about images is a promising alternative whichleverages a much broader source of supervision.We demonstrate that the simple pre-training taskof predicting which caption goes with which im-age is an efficient and scalable way to learn SOTAimage representations from scratch on a datasetof 400 million (image, text) pairs collected fromthe internet.

Read the paper here

Recommended citation: Radford, Alec & Kim, Jong Wook, et al. “Learning Transferable Visual Models From Natural Language Supervision.” (2021).

Leave a Comment