Update H1 in Explorer API docs (#7813)

2025-12-25 14:45:38 +08:00 · 2024-01-25 13:10:49 +05:00 · 2024-01-25 13:10:49 +05:00 · c460ad28f8
commit c460ad28f8
parent a2222b4283
3 changed files with 602 additions and 519 deletions
--- a/docs/en/datasets/explorer/api.md
+++ b/docs/en/datasets/explorer/api.md
@ -314,11 +314,11 @@ plt.show()

 Start creating your own CV dataset exploration reports using the Explorer API. For inspiration, check out the

-# Apps Built Using Ultralytics Explorer
+## Apps Built Using Ultralytics Explorer

 Try our GUI Demo based on Explorer API

-# Coming Soon
+## Coming Soon

 - [ ] Merge specific labels from datasets. Example - Import all `person` labels from COCO and `car` labels from Cityscapes
 - [ ] Remove images that have a higher similarity index than the given threshold
--- a/docs/en/datasets/explorer/dashboard.md
+++ b/docs/en/datasets/explorer/dashboard.md
@ -1,5 +1,5 @@
 ---
-comments: 5rue
+comments: true
 description: Learn about Ultralytics Explorer GUI for semantic search, SQL queries, and AI-powered natural language search in CV datasets.
 keywords: Ultralytics, Explorer GUI, semantic search, vector similarity search, AI queries, SQL queries, computer vision, dataset exploration, image search, OpenAI integration
 ---
--- a/docs/en/datasets/explorer/explorer.ipynb
+++ b/docs/en/datasets/explorer/explorer.ipynb
@ -3,9 +3,11 @@
    {
      "cell_type": "markdown",
      "id": "aa923c26-81c8-4565-9277-1cb686e3702e",
-   "metadata": {},
+      "metadata": {
+        "id": "aa923c26-81c8-4565-9277-1cb686e3702e"
+      },
      "source": [
-    "# VOC Exploration Example \n",
+        "# VOC Exploration Example\n",
        "<div align=\"center\">\n",
        "\n",
        "  <a href=\"https://ultralytics.com/yolov8\" target=\"_blank\">\n",
@ -31,7 +33,9 @@
    {
      "cell_type": "markdown",
      "id": "2454d9ba-9db4-4b37-98e8-201ba285c92f",
-   "metadata": {},
+      "metadata": {
+        "id": "2454d9ba-9db4-4b37-98e8-201ba285c92f"
+      },
      "source": [
        "## Setup\n",
        "Pip install `ultralytics` and [dependencies](https://github.com/ultralytics/ultralytics/blob/main/pyproject.toml) and check software and hardware."
@ -41,7 +45,9 @@
      "cell_type": "code",
      "execution_count": null,
      "id": "433f3a4d-a914-42cb-b0b6-be84a84e5e41",
-   "metadata": {},
+      "metadata": {
+        "id": "433f3a4d-a914-42cb-b0b6-be84a84e5e41"
+      },
      "outputs": [],
      "source": [
        "%pip install ultralytics[explorer] openai\n",
@ -53,7 +59,9 @@
      "cell_type": "code",
      "execution_count": null,
      "id": "ae602549-3419-4909-9f82-35cba515483f",
-   "metadata": {},
+      "metadata": {
+        "id": "ae602549-3419-4909-9f82-35cba515483f"
+      },
      "outputs": [],
      "source": [
        "from ultralytics import Explorer"
@ -62,9 +70,11 @@
    {
      "cell_type": "markdown",
      "id": "d8c06350-be8e-45cf-b3a6-b5017bbd943c",
-   "metadata": {},
+      "metadata": {
+        "id": "d8c06350-be8e-45cf-b3a6-b5017bbd943c"
+      },
      "source": [
-    "# Similarity search\n",
+        "## Similarity search\n",
        "Utilize the power of vector similarity search to find the similar data points in your dataset along with their distance in the embedding space. Simply create an embeddings table for the given dataset-model pair. It is only needed once and it is reused automatically.\n"
      ]
    },
@ -72,7 +82,9 @@
      "cell_type": "code",
      "execution_count": null,
      "id": "334619da-6deb-4b32-9fe0-74e0a79cee20",
-   "metadata": {},
+      "metadata": {
+        "id": "334619da-6deb-4b32-9fe0-74e0a79cee20"
+      },
      "outputs": [],
      "source": [
        "exp = Explorer(\"VOC.yaml\", model=\"yolov8n.pt\")\n",
@ -82,7 +94,9 @@
    {
      "cell_type": "markdown",
      "id": "b6c5e42d-bc7e-4b4c-bde0-643072a2165d",
-   "metadata": {},
+      "metadata": {
+        "id": "b6c5e42d-bc7e-4b4c-bde0-643072a2165d"
+      },
      "source": [
        "One the embeddings table is built, you can get run semantic search in any of the following ways:\n",
        "- On a given index / list of indices in the dataset like - `exp.get_similar(idx=[1,10], limit=10)`\n",
@ -97,7 +111,9 @@
      "cell_type": "code",
      "execution_count": null,
      "id": "b485f05b-d92d-42bc-8da7-5e361667b341",
-   "metadata": {},
+      "metadata": {
+        "id": "b485f05b-d92d-42bc-8da7-5e361667b341"
+      },
      "outputs": [],
      "source": [
        "similar = exp.get_similar(idx=1, limit=10)\n",
@ -107,7 +123,9 @@
    {
      "cell_type": "markdown",
      "id": "acf4b489-2161-4176-a1fe-d1d067d8083d",
-   "metadata": {},
+      "metadata": {
+        "id": "acf4b489-2161-4176-a1fe-d1d067d8083d"
+      },
      "source": [
        "You can use the also plot the similar samples directly using the `plot_similar` util\n",
        "<p>\n",
@ -120,7 +138,9 @@
      "cell_type": "code",
      "execution_count": null,
      "id": "9dbfe7d0-8613-4529-adb6-6e0632d7cce7",
-   "metadata": {},
+      "metadata": {
+        "id": "9dbfe7d0-8613-4529-adb6-6e0632d7cce7"
+      },
      "outputs": [],
      "source": [
        "exp.plot_similar(idx=6500, limit=20)\n",
@ -131,7 +151,9 @@
      "cell_type": "code",
      "execution_count": null,
      "id": "260e09bf-4960-4089-a676-cb0e76ff3c0d",
-   "metadata": {},
+      "metadata": {
+        "id": "260e09bf-4960-4089-a676-cb0e76ff3c0d"
+      },
      "outputs": [],
      "source": [
        "exp.plot_similar(img=\"https://ultralytics.com/images/bus.jpg\", limit=10, labels=False) # Can also pass any external images\n"
@ -140,7 +162,9 @@
    {
      "cell_type": "markdown",
      "id": "faa0b7a7-6318-40e4-b0f4-45a8113bdc3a",
-   "metadata": {},
+      "metadata": {
+        "id": "faa0b7a7-6318-40e4-b0f4-45a8113bdc3a"
+      },
      "source": [
        "<p>\n",
        "<img  src=\"https://github.com/AyushExel/assets/assets/15766192/8e011195-b0da-43ef-b3cd-5fb6f383037e\">\n",
@ -151,7 +175,9 @@
    {
      "cell_type": "markdown",
      "id": "0cea63f1-71f1-46da-af2b-b1b7d8f73553",
-   "metadata": {},
+      "metadata": {
+        "id": "0cea63f1-71f1-46da-af2b-b1b7d8f73553"
+      },
      "source": [
        "## 2. Ask AI: Search or filter with Natural Language\n",
        "You can prompt the Explorer object with the kind of data points you want to see and it'll try to return a dataframe with those. Because it is powered by LLMs, it doesn't always get it right. In that case, it'll return None.\n",
@ -166,7 +192,9 @@
      "cell_type": "code",
      "execution_count": null,
      "id": "92fb92ac-7f76-465a-a9ba-ea7492498d9c",
-   "metadata": {},
+      "metadata": {
+        "id": "92fb92ac-7f76-465a-a9ba-ea7492498d9c"
+      },
      "outputs": [],
      "source": [
        "df = exp.ask_ai(\"show me images containing more than 10 objects with at least 2 persons\")\n",
@ -176,7 +204,9 @@
    {
      "cell_type": "markdown",
      "id": "f2a7d26e-0ce5-4578-ad1a-b1253805280f",
-   "metadata": {},
+      "metadata": {
+        "id": "f2a7d26e-0ce5-4578-ad1a-b1253805280f"
+      },
      "source": [
        "for plotting these results you can use `plot_query_result` util\n",
        "Example:\n",
@ -194,7 +224,9 @@
      "cell_type": "code",
      "execution_count": null,
      "id": "b1cfab84-9835-4da0-8e9a-42b30cf84511",
-   "metadata": {},
+      "metadata": {
+        "id": "b1cfab84-9835-4da0-8e9a-42b30cf84511"
+      },
      "outputs": [],
      "source": [
        "# plot\n",
@ -208,7 +240,9 @@
    {
      "cell_type": "markdown",
      "id": "35315ae6-d827-40e4-8813-279f97a83b34",
-   "metadata": {},
+      "metadata": {
+        "id": "35315ae6-d827-40e4-8813-279f97a83b34"
+      },
      "source": [
        "## 3. Run SQL queries on your Dataset!\n",
        "Sometimes you might want to investigate a certain type of entries in your dataset. For this Explorer allows you to execute SQL queries.\n",
@ -227,7 +261,9 @@
      "cell_type": "code",
      "execution_count": null,
      "id": "8cd1072f-3100-4331-a0e3-4e2f6b1005bf",
-   "metadata": {},
+      "metadata": {
+        "id": "8cd1072f-3100-4331-a0e3-4e2f6b1005bf"
+      },
      "outputs": [],
      "source": [
        "table = exp.sql_query(\"WHERE labels LIKE '%person, person%' AND labels LIKE '%dog%' LIMIT 10\")\n",
@ -237,7 +273,9 @@
    {
      "cell_type": "markdown",
      "id": "debf8a00-c9f6-448b-bd3b-454cf62f39ab",
-   "metadata": {},
+      "metadata": {
+        "id": "debf8a00-c9f6-448b-bd3b-454cf62f39ab"
+      },
      "source": [
        "Just like similarity search, you also get a util to directly plot the sql queries using `exp.plot_sql_query`\n",
        "<img src=\"https://github.com/AyushExel/assets/assets/15766192/f8b66629-8dd0-419e-8f44-9837969ba678\">\n"
@ -247,7 +285,9 @@
      "cell_type": "code",
      "execution_count": null,
      "id": "18b977e7-d048-4b22-b8c4-084a03b04f23",
-   "metadata": {},
+      "metadata": {
+        "id": "18b977e7-d048-4b22-b8c4-084a03b04f23"
+      },
      "outputs": [],
      "source": [
        "exp.plot_sql_query(\"WHERE labels LIKE '%person, person%' AND labels LIKE '%dog%' LIMIT 10\", labels=True)"
@ -256,7 +296,9 @@
    {
      "cell_type": "markdown",
      "id": "f26804c5-840b-4fd1-987f-e362f29e3e06",
-   "metadata": {},
+      "metadata": {
+        "id": "f26804c5-840b-4fd1-987f-e362f29e3e06"
+      },
      "source": [
        "## 3. Working with embeddings Table (Advanced)\n",
        "Explorer works on [LanceDB](https://lancedb.github.io/lancedb/) tables internally. You can access this table directly, using `Explorer.table` object and run raw queries, push down pre and post filters, etc."
@ -266,7 +308,9 @@
      "cell_type": "code",
      "execution_count": null,
      "id": "ea69260a-3407-40c9-9f42-8b34a6e6af7a",
-   "metadata": {},
+      "metadata": {
+        "id": "ea69260a-3407-40c9-9f42-8b34a6e6af7a"
+      },
      "outputs": [],
      "source": [
        "table = exp.table\n",
@ -276,7 +320,9 @@
    {
      "cell_type": "markdown",
      "id": "238db292-8610-40b3-9af7-dfd6be174892",
-   "metadata": {},
+      "metadata": {
+        "id": "238db292-8610-40b3-9af7-dfd6be174892"
+      },
      "source": [
        "### Run raw queries\n",
        "Vector Search finds the nearest vectors from the database. In a recommendation system or search engine, you can find similar products from the one you searched. In LLM and other AI applications, each data point can be presented by the embeddings generated from some models, it returns the most relevant features.\n",
@ -297,17 +343,21 @@
      "cell_type": "code",
      "execution_count": null,
      "id": "d74430fe-5aee-45a1-8863-3f2c31338792",
-   "metadata": {},
+      "metadata": {
+        "id": "d74430fe-5aee-45a1-8863-3f2c31338792"
+      },
      "outputs": [],
      "source": [
-    "dummy_img_embedding = [i for i in range(256)] \n",
+        "dummy_img_embedding = [i for i in range(256)]\n",
        "table.search(dummy_img_embedding).limit(5).to_pandas()"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "587486b4-0d19-4214-b994-f032fb2e8eb5",
-   "metadata": {},
+      "metadata": {
+        "id": "587486b4-0d19-4214-b994-f032fb2e8eb5"
+      },
      "source": [
        "### Inter-conversion to popular data formats"
      ]
@ -316,7 +366,9 @@
      "cell_type": "code",
      "execution_count": null,
      "id": "bb2876ea-999b-4eba-96bc-c196ba02c41c",
-   "metadata": {},
+      "metadata": {
+        "id": "bb2876ea-999b-4eba-96bc-c196ba02c41c"
+      },
      "outputs": [],
      "source": [
        "df = table.to_pandas()\n",
@ -326,7 +378,9 @@
    {
      "cell_type": "markdown",
      "id": "42659d63-ad76-49d6-8dfc-78d77278db72",
-   "metadata": {},
+      "metadata": {
+        "id": "42659d63-ad76-49d6-8dfc-78d77278db72"
+      },
      "source": [
        "### Work with Embeddings\n",
        "You can access the raw embedding from lancedb Table and analyse it. The image embeddings are stored in column `vector`"
@ -336,7 +390,9 @@
      "cell_type": "code",
      "execution_count": null,
      "id": "66d69e9b-046e-41c8-80d7-c0ee40be3bca",
-   "metadata": {},
+      "metadata": {
+        "id": "66d69e9b-046e-41c8-80d7-c0ee40be3bca"
+      },
      "outputs": [],
      "source": [
        "import numpy as np\n",
@ -348,7 +404,9 @@
    {
      "cell_type": "markdown",
      "id": "e8df0a49-9596-4399-954b-b8ae1fd7a602",
-   "metadata": {},
+      "metadata": {
+        "id": "e8df0a49-9596-4399-954b-b8ae1fd7a602"
+      },
      "source": [
        "### Scatterplot\n",
        "One of the preliminary steps in analysing embeddings is by plotting them in 2D space via dimensionality reduction. Let's try an example\n",
@ -360,7 +418,9 @@
      "cell_type": "code",
      "execution_count": null,
      "id": "d9a150e8-8092-41b3-82f8-2247f8187fc8",
-   "metadata": {},
+      "metadata": {
+        "id": "d9a150e8-8092-41b3-82f8-2247f8187fc8"
+      },
      "outputs": [],
      "source": [
        "!pip install scikit-learn --q"
@ -370,7 +430,9 @@
      "cell_type": "code",
      "execution_count": null,
      "id": "196079c3-45a9-4325-81ab-af79a881e37a",
-   "metadata": {},
+      "metadata": {
+        "id": "196079c3-45a9-4325-81ab-af79a881e37a"
+      },
      "outputs": [],
      "source": [
        "%matplotlib inline\n",
@ -400,7 +462,9 @@
    {
      "cell_type": "markdown",
      "id": "1c843c23-e3f2-490e-8d6c-212fa038a149",
-   "metadata": {},
+      "metadata": {
+        "id": "1c843c23-e3f2-490e-8d6c-212fa038a149"
+      },
      "source": [
        "## 4. Similarity Index\n",
        "Here's a simple example of an operation powered by the embeddings table. Explorer comes with a `similarity_index` operation-\n",
@ -417,7 +481,9 @@
      "cell_type": "code",
      "execution_count": null,
      "id": "953c2a5f-1b61-4acf-a8e4-ed08547dbafc",
-   "metadata": {},
+      "metadata": {
+        "id": "953c2a5f-1b61-4acf-a8e4-ed08547dbafc"
+      },
      "outputs": [],
      "source": [
        "exp.plot_similarity_index(max_dist=0.2, top_k=0.01)"
@ -426,7 +492,9 @@
    {
      "cell_type": "markdown",
      "id": "28228a9a-b727-45b5-8ca7-8db662c0b937",
-   "metadata": {},
+      "metadata": {
+        "id": "28228a9a-b727-45b5-8ca7-8db662c0b937"
+      },
      "source": [
        "Now let's look at the output of the operation"
      ]
@ -435,7 +503,9 @@
      "cell_type": "code",
      "execution_count": null,
      "id": "f4161aaa-20e6-4df0-8e87-d2293ee0530a",
-   "metadata": {},
+      "metadata": {
+        "id": "f4161aaa-20e6-4df0-8e87-d2293ee0530a"
+      },
      "outputs": [],
      "source": [
        "import numpy as np\n",
@ -447,7 +517,9 @@
      "cell_type": "code",
      "execution_count": null,
      "id": "b01d5b1a-9adb-4c3c-a873-217c71527c8d",
-   "metadata": {},
+      "metadata": {
+        "id": "b01d5b1a-9adb-4c3c-a873-217c71527c8d"
+      },
      "outputs": [],
      "source": [
        "sim_idx"
@ -456,7 +528,9 @@
    {
      "cell_type": "markdown",
      "id": "22b28e54-4fbb-400e-ad8c-7068cbba11c4",
-   "metadata": {},
+      "metadata": {
+        "id": "22b28e54-4fbb-400e-ad8c-7068cbba11c4"
+      },
      "source": [
        "Let's create a query to see what data points have similarity count of more than 30 and plot images similar to them."
      ]
@ -465,7 +539,9 @@
      "cell_type": "code",
      "execution_count": null,
      "id": "58d2557b-d401-43cf-937d-4f554c7bc808",
-   "metadata": {},
+      "metadata": {
+        "id": "58d2557b-d401-43cf-937d-4f554c7bc808"
+      },
      "outputs": [],
      "source": [
        "import numpy as np\n",
@ -477,7 +553,9 @@
    {
      "cell_type": "markdown",
      "id": "a5ec8d76-271a-41ab-ac74-cf8c0084ba5e",
-   "metadata": {},
+      "metadata": {
+        "id": "a5ec8d76-271a-41ab-ac74-cf8c0084ba5e"
+      },
      "source": [
        "You should see something like this\n",
        "<img src=\"https://github.com/AyushExel/assets/assets/15766192/649bc366-ca2d-46ea-bfd9-3097cf575584\">\n"
@ -487,7 +565,9 @@
      "cell_type": "code",
      "execution_count": null,
      "id": "3a7b2ee3-9f35-48a2-9c38-38379516f4d2",
-   "metadata": {},
+      "metadata": {
+        "id": "3a7b2ee3-9f35-48a2-9c38-38379516f4d2"
+      },
      "outputs": [],
      "source": [
        "exp.plot_similar(idx=[7146, 14035]) # Using avg embeddings of 2 images"
@ -511,6 +591,9 @@
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.9.6"
+    },
+    "colab": {
+      "provenance": []
    }
  },
  "nbformat": 4,