prithivMLmods
/

MetaCLIP-2-Open-Scene

@@ -11,6 +11,16 @@ tags:
 - open-scene
 ---
 ```
 Classification Report:
               precision    recall  f1-score   support
@@ -27,4 +37,87 @@ Classification Report:
 weighted avg     0.9706    0.9706    0.9706     16345
 ```
-![download](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/gJUwvNsxBQAh30FprXlyP.png)

 - open-scene
 ---
+![1](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/Cwn-cWX3RDmAhywdLgocX.png)
+# **MetaCLIP-2-Open-Scene**
+> **MetaCLIP-2-Open-Scene** is an image classification vision-language encoder model fine-tuned from **[facebook/metaclip-2-worldwide-s16](https://huggingface.co/facebook/metaclip-2-worldwide-s16)** for a single-label classification task.
+> It is designed to identify and categorize various natural and urban scenes using the **MetaClip2ForImageClassification** architecture.
+>[!note]
+MetaCLIP 2: A Worldwide Scaling Recipe : https://huggingface.co/papers/2507.22062
 ```
 Classification Report:
               precision    recall  f1-score   support
 weighted avg     0.9706    0.9706    0.9706     16345
 ```
+![download](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/gJUwvNsxBQAh30FprXlyP.png)
+The model classifies images into six open-scene categories:
+* **Class 0:** "buildings"
+* **Class 1:** "forest"
+* **Class 2:** "glacier"
+* **Class 3:** "mountain"
+* **Class 4:** "sea"
+* **Class 5:** "street"
+# **Run with Transformers**
+```python
+!pip install -q transformers torch pillow gradio
+```
+```python
+import gradio as gr
+from transformers import AutoImageProcessor
+from transformers import AutoModelForImageClassification
+from transformers.image_utils import load_image
+from PIL import Image
+import torch
+# Load model and processor
+model_name = "prithivMLmods/MetaCLIP-2-Open-Scene"
+model = AutoModelForImageClassification.from_pretrained(model_name)
+processor = AutoImageProcessor.from_pretrained(model_name)
+def scene_classification(image):
+    """Predicts the type of scene represented in an image."""
+    image = Image.fromarray(image).convert("RGB")
+    inputs = processor(images=image, return_tensors="pt")
+    with torch.no_grad():
+        outputs = model(**inputs)
+        logits = outputs.logits
+        probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
+    labels = {
+        "0": "buildings",
+        "1": "forest",
+        "2": "glacier",
+        "3": "mountain",
+        "4": "sea",
+        "5": "street"
+    }
+    predictions = {labels[str(i)]: round(probs[i], 3) for i in range(len(probs))}
+    return predictions
+# Create Gradio interface
+iface = gr.Interface(
+    fn=scene_classification,
+    inputs=gr.Image(type="numpy"),
+    outputs=gr.Label(label="Prediction Scores"),
+    title="Open Scene Classification",
+    description="Upload an image to classify the scene type (e.g., forest, sea, street, mountain, etc.)."
+)
+# Launch the app
+if __name__ == "__main__":
+    iface.launch()
+```
+# **Sample Inference:**
+![Screenshot 2025-11-13 at 19-39-55 Open Scene Classification](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/9vHPyQsv3FeduOU1s4A6r.png)
+![Screenshot 2025-11-13 at 19-37-07 Open Scene Classification](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/H_5f4qIB5XYQuOyeNGh_P.png)
+![Screenshot 2025-11-13 at 19-37-50 Open Scene Classification](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/KM0hBDUFF5-zU9tCXoU1w.png)
+![Screenshot 2025-11-13 at 19-38-37 Open Scene Classification](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/ErSRLvFr3fLGUTO86klOT.png)
+![Screenshot 2025-11-13 at 19-39-24 Open Scene Classification](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/_NYjjKZXQx8fJGOyjhY4r.png)
+# **Intended Use:**
+The **MetaCLIP-2-Open-Scene** model is designed to classify a wide range of natural and urban environments.
+Potential use cases include:
+* **Geographical Image Analysis:** Categorizing landscapes for environmental and mapping studies.
+* **Tourism and Travel Applications:** Automatically tagging scenic photos for organization and recommendations.
+* **Autonomous Systems:** Supporting navigation and perception in robotics and self-driving systems.
+* **Environmental Monitoring:** Detecting and classifying geographic features for research.
+* **Media and Photography:** Assisting in photo organization and content-based retrieval.