Prompt-based Image Filtering with CLIP

Filter and rank images by text prompts with CLIP models

Prompt-based Image Filtering with CLIP

OverviewPretrained modelsHow To RunAcknowledgment

GitHub release (latest SemVer) views runs

Overview

🔥🔥🔥 Check out our youtube tutorial and the complete guide in our blog:


This app allows you to quickly and easily filter and rank images in Supervisely datasets by text prompts. It uses CLIP model to predict the relevance of images to the given text prompt. This app can be useful for filtering or ranking images in a dataset by their content. The relevance (CLIP score) of each image to the given prompt will be shown in a table. The user can choose to filter or sort images by relevance or do both at the same time and then upload images to a new dataset.

Pretrained models

We have selected several pre-trained models from the OpenCLIP repository:

Model Pretrained top-1 accuracy on ImageNet Size
coca_ViT-L-14 mscoco_finetuned_laion2B-s13B-b90k - 2.55 GB
coca_ViT-L-14 laion2B-s13B-b90k 75.5% 2.55 GB
ViT-L-14 openai 75.5% 933 MB
ViT-L-14 laion2b_s32b_b82k 75.3% 933 MB
ViT-L-14-336 openai - 933 MB
ViT-g-14 laion2b_s34b_b88k 78.5% 5.47 GB
ViT-bigG-14 laion2b_s39b_b160k 80.1% 10.2 GB
convnext_base_w laion2b_s13b_b82k_augreg 71.5% 718 MB
convnext_large_d_320 laion2b_s29b_b131k_ft_soup 76.9% 1.41 GB

How To Run

Step 0: Run the application from Ecosystem, the context menu of the images project or the images dataset.
Note: if you don't run the app from the context menu of a dataset, first of all, you need to specify the dataset to work with. You need to select a dataset in the Input dataset section. After selecting the dataset, click the button Load data under the dataset selector. The app will load the dataset and generate a table with all images in the dataset. When the data from the dataset will be loaded, the dataset selector will be locked until you click the Change dataset button.

Step 1: Choose the desired Model, and select the Batch size (if the default 32 value isn't suitable for your needs). You can uncheck Enable JIT checkbox if you want to use the model without JIT compilation.



Step 2: Enter the text prompt in the Text prompt field. The prompt can be a single word or a phrase. And then click the Start Inference button. The app will start with downloading chosen model and then it will start inference of images with specified batch size. You can stop the inference process at any time by clicking the Cancel inference button.



Step 3: After the inference is finished, the next section of the app will be unlocked. The chart shows a CLIP's score (on Y-axis) for each image (X-axis is for image indices), and the images are sorted by the scores in descending order. This can give you an intuition of what kind of data you have in general (e.g. the number of images within a score range) and help to select a threshold. You can also see a table with all images from the dataset and their scores, which are sorted by score in descending order by default. You can press the Select button in the table to preview the image. It can be handy for finding the optimal threshold for image filtering.



Step 4: In the final step you need to select the output dataset for images (note, that the same dataset can not be selected) and define how the app should handle images in the dataset: filter, sort or both. Note, that at least one of the following options should be selected:

  • Filter images - the app will filter images by the score threshold and upload only images whose score is higher or lower than the selected Threshold. You can choose which images should be kept: above or below the threshold.
  • Sort images - the app will sort images by the score and upload them to the output dataset in descending order or ascending order.

You can also check Add confidence tag checkbox to add a tag with the confidence score to each image in the output dataset. Note that this option can slow down uploading images.

screen-clip

After the upload is finished, you will see a message with the number of images that have been successfully uploaded to the dataset. The app will also show the project and the dataset to which the images were uploaded. You can click on the links to open the project or the dataset.

After finishing using the app, don't forget to stop the app session manually in the App Sessions. The app will write information about the text prompt and CLIP score to the image metadata. You can find this information in the Image Properties - Info section of the image in the labeling tool.

Acknowledgment

This app is based on the great work CLIP: