Merge pull request #248 from deepset-ai/update-multimodal-notebook

bilgeyucel · web-flow · commit 04485b4da277 · 2025-07-29T11:19:43.000+02:00
update multimodal intro notebook for 2.16 release
diff --git a/index.toml b/index.toml
@@ -321,9 +321,7 @@ topics = ["Function Calling", "Agents"]
 title = "Introduction to Multimodal Text Generation"
 notebook = "multimodal_intro.ipynb"
 new = true
-experimental = true
 topics = ["Multimodal"]
-discuss = "https://github.com/deepset-ai/haystack-experimental/discussions/302"
 
 [[cookbook]]
 title = "Build a GitHub PR Creator Agent"
diff --git a/notebooks/multimodal_intro.ipynb b/notebooks/multimodal_intro.ipynb
@@ -8,7 +8,7 @@
       "source": [
         "# 🖼️ Introduction to Multimodal Text Generation\n",
         "\n",
-        "In this notebook, we introduce the experimental features we've developed so far for multimodal text generation in Haystack. The experiment is ongoing, so expect more in the future.\n",
+        "In this notebook, we introduce the features that enable multimodal text generation in Haystack.\n",
         "\n",
         "- We introduced the `ImageContent` dataclass, which represents the image content of a user `ChatMessage`.\n",
         "- We developed some image converter components.\n",
@@ -35,7 +35,7 @@
       },
       "outputs": [],
       "source": [
-        "!pip install \"haystack-experimental\" gdown nest_asyncio pillow pypdfium2 python-weather"
+        "!pip install haystack-ai gdown nest_asyncio pillow pypdfium2 python-weather"
       ]
     },
     {
@@ -75,7 +75,7 @@
       "source": [
         "## Introduction to `ImageContent`\n",
         "\n",
-        "[`ImageContent`](https://github.com/deepset-ai/haystack-experimental/blob/main/haystack_experimental/dataclasses/image_content.py) is a new dataclass that stores the image content of a user `ChatMessage`.\n",
+        "[`ImageContent`](https://github.com/deepset-ai/haystack/blob/main/haystack/dataclasses/image_content.py) is a new dataclass that stores the image content of a user `ChatMessage`.\n",
         "\n",
         "It has the following attributes:\n",
         "- `base64_image`: A base64 string representing the image.\n",
@@ -129,8 +129,8 @@
       },
       "outputs": [],
       "source": [
-        "from haystack_experimental.dataclasses import ImageContent, ChatMessage\n",
-        "from haystack_experimental.components.generators.chat import OpenAIChatGenerator\n",
+        "from haystack.dataclasses import ImageContent, ChatMessage\n",
+        "from haystack.components.generators.chat import OpenAIChatGenerator\n",
         "import base64\n",
         "\n",
         "with open(\"capybara.jpg\", \"rb\") as fd:\n",
@@ -364,8 +364,8 @@
         "## Image Converters for `ImageContent`\n",
         "\n",
         "To perform image conversion in multimodal pipelines, we also introduced two image converters:\n",
-        "- [`ImageFileToImageContent`](https://github.com/deepset-ai/haystack-experimental/blob/main/haystack_experimental/components/converters/image/file_to_image.py), which converts image files to `ImageContent` objects (similar to `from_file_path`).\n",
-        "- [`PDFToImageContent`](https://github.com/deepset-ai/haystack-experimental/blob/main/haystack_experimental/components/converters/image/pdf_to_image.py), which converts PDF files to `ImageContent` objects."
+        "- [`ImageFileToImageContent`](https://github.com/deepset-ai/haystack/blob/main/haystack/components/converters/image/file_to_image.py), which converts image files to `ImageContent` objects (similar to `from_file_path`).\n",
+        "- [`PDFToImageContent`](https://github.com/deepset-ai/haystack/blob/main/haystack/components/converters/image/pdf_to_image.py), which converts PDF files to `ImageContent` objects."
       ]
     },
     {
@@ -376,7 +376,7 @@
       },
       "outputs": [],
       "source": [
-        "from haystack_experimental.components.converters.image import ImageFileToImageContent\n",
+        "from haystack.components.converters.image import ImageFileToImageContent\n",
         "\n",
         "converter = ImageFileToImageContent(detail=\"low\", size=(300, 300))\n",
         "result = converter.run(sources=[\"capybara.jpg\"])"
@@ -477,7 +477,7 @@
         }
       ],
       "source": [
-        "from haystack_experimental.components.converters.image import PDFToImageContent\n",
+        "from haystack.components.converters.image import PDFToImageContent\n",
         "\n",
         "pdf_converter = PDFToImageContent()\n",
         "paper_page_image = pdf_converter.run(sources=[\"flan_paper.pdf\"], page_range=\"9\")[\"image_contents\"][0]\n",
@@ -615,7 +615,7 @@
         }
       ],
       "source": [
-        "from haystack_experimental.components.builders import ChatPromptBuilder\n",
+        "from haystack.components.builders import ChatPromptBuilder\n",
         "\n",
         "builder = ChatPromptBuilder(template, required_variables=\"*\")\n",
         "\n",
@@ -786,8 +786,8 @@
       "outputs": [],
       "source": [
         "from haystack.components.retrievers.in_memory import InMemoryBM25Retriever\n",
-        "from haystack_experimental.components.generators.chat import OpenAIChatGenerator\n",
-        "from haystack_experimental.dataclasses import ImageContent, ChatMessage\n",
+        "from haystack.components.generators.chat import OpenAIChatGenerator\n",
+        "from haystack.dataclasses import ImageContent, ChatMessage\n",
         "\n",
         "retriever = InMemoryBM25Retriever(document_store=document_store)\n",
         "llm = OpenAIChatGenerator(model=\"gpt-4o-mini\")\n",
@@ -891,133 +891,6 @@
         "]"
       ]
     },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "hgPfD4sD0uqW"
-      },
-      "source": [
-        "### Using a Pipeline\n",
-        "\n",
-        "The retrieval part of the application showed above can also be implemented using a Pipeline.\n",
-        "\n",
-        "As you can see, there are some aspects to improve in terms of developer experience and we will work in this direction."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "colab": {
-          "base_uri": "https://localhost:8080/"
-        },
-        "id": "6cFG-YQq0uWg",
-        "outputId": "4ad910e9-0d14-4f4c-9e2c-85e3051083c3"
-      },
-      "outputs": [
-        {
-          "data": {
-            "text/plain": [
-              "<haystack.core.pipeline.pipeline.Pipeline object at 0x7c607962a550>\n",
-              "🚅 Components\n",
-              "  - retriever: InMemoryBM25Retriever\n",
-              "  - output_adapter: OutputAdapter\n",
-              "  - image_converter: ImageFileToImageContent\n",
-              "  - prompt_builder: ChatPromptBuilder\n",
-              "  - generator: OpenAIChatGenerator\n",
-              "🛤️ Connections\n",
-              "  - retriever.documents -> output_adapter.documents (List[Document])\n",
-              "  - output_adapter.output -> image_converter.sources (List[str])\n",
-              "  - image_converter.image_contents -> prompt_builder.image_contents (List[ImageContent])\n",
-              "  - prompt_builder.prompt -> generator.messages (List[ChatMessage])"
-            ]
-          },
-          "execution_count": 48,
-          "metadata": {},
-          "output_type": "execute_result"
-        }
-      ],
-      "source": [
-        "from typing import List\n",
-        "\n",
-        "from haystack import Pipeline\n",
-        "from haystack.components.converters import OutputAdapter\n",
-        "from haystack.components.retrievers.in_memory import InMemoryBM25Retriever\n",
-        "\n",
-        "from haystack_experimental.components.builders import ChatPromptBuilder\n",
-        "from haystack_experimental.components.converters.image import ImageFileToImageContent\n",
-        "from haystack_experimental.components.generators.chat.openai import OpenAIChatGenerator\n",
-        "\n",
-        "\n",
-        "chat_template = \"\"\"\n",
-        "{% message role=\"user\" %}\n",
-        "{{query}}\n",
-        "{% for image_content in image_contents %}\n",
-        "    {{image_content | templatize_part}}\n",
-        "{% endfor %}\n",
-        "{% endmessage %}\n",
-        "\"\"\"\n",
-        "\n",
-        "output_adapter_template = \"\"\"\n",
-        "{%- set paths = [] -%}\n",
-        "{% for document in documents %}\n",
-        "    {%- set _ = paths.append(document.meta.image_path) -%}\n",
-        "{% endfor %}\n",
-        "{{paths}}\n",
-        "\"\"\"\n",
-        "\n",
-        "rag_pipeline = Pipeline()\n",
-        "\n",
-        "rag_pipeline.add_component(\"retriever\", InMemoryBM25Retriever(document_store=document_store, top_k=1))\n",
-        "rag_pipeline.add_component(\"output_adapter\", OutputAdapter(template=output_adapter_template, output_type=List[str]))\n",
-        "rag_pipeline.add_component(\"image_converter\", ImageFileToImageContent(detail=\"auto\"))\n",
-        "rag_pipeline.add_component(\"prompt_builder\", ChatPromptBuilder(template=chat_template, required_variables=\"*\"))\n",
-        "rag_pipeline.add_component(\"generator\", OpenAIChatGenerator(model=\"gpt-4o-mini\"))\n",
-        "\n",
-        "rag_pipeline.connect(\"retriever.documents\", \"output_adapter.documents\")\n",
-        "rag_pipeline.connect(\"output_adapter.output\", \"image_converter.sources\")\n",
-        "rag_pipeline.connect(\"image_converter.image_contents\", \"prompt_builder.image_contents\")\n",
-        "rag_pipeline.connect(\"prompt_builder.prompt\", \"generator.messages\")"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "colab": {
-          "base_uri": "https://localhost:8080/"
-        },
-        "id": "uX-y-mC038c5",
-        "outputId": "b1b85e55-8dce-4254-e18f-df4f92c9aeeb"
-      },
-      "outputs": [
-        {
-          "name": "stdout",
-          "output_type": "stream",
-          "text": [
-            "('The image illustrates how LoRA (Low-Rank Adaptation) systems learn \"intruder '\n",
-            " 'dimensions\"—singular vectors that differ from those in the pre-trained '\n",
-            " 'weight matrix during fine-tuning. \\n'\n",
-            " '\\n'\n",
-            " '- **Panel (a)** shows the architecture of LoRA and full fine-tuning, '\n",
-            " 'emphasizing the addition of learned parameters \\\\( B \\\\) and \\\\( A \\\\) in '\n",
-            " 'LoRA.\\n'\n",
-            " '- **Panel (b)** compares the cosine similarity of singular vectors from LoRA '\n",
-            " \"and full fine-tuning, revealing that LoRA's learned vectors diverge more \"\n",
-            " 'from pre-trained weights.\\n'\n",
-            " '- **Panel (c)** depicts cosine similarity distributions, highlighting that '\n",
-            " 'regular vectors stay consistent while intruder dimensions show significant '\n",
-            " 'deviation.')\n"
-          ]
-        }
-      ],
-      "source": [
-        "query = \"What the image from the Lora vs Full Fine-tuning paper tries to show? Be short.\"\n",
-        "\n",
-        "response = rag_pipeline.run(data={\"query\": query})[\"generator\"][\"replies\"][0].text\n",
-        "print(response)"
-      ]
-    },
     {
       "cell_type": "markdown",
       "metadata": {
@@ -1046,9 +919,9 @@
         "\n",
         "from haystack.tools import tool\n",
         "from haystack.components.agents import Agent\n",
+        "from haystack.components.generators.chat import OpenAIChatGenerator\n",
         "\n",
-        "from haystack_experimental.dataclasses import ChatMessage, ImageContent\n",
-        "from haystack_experimental.components.generators.chat import OpenAIChatGenerator\n",
+        "from haystack.dataclasses import ChatMessage, ImageContent\n",
         "import python_weather\n",
         "\n",
         "# only needed in Jupyter notebooks where there is an event loop running\n",
@@ -1197,9 +1070,10 @@
       "source": [
         "## What's next?\n",
         "\n",
-        "You can follow the progress of the Multimodal experiment in this [GitHub issue](https://github.com/deepset-ai/haystack/issues/8976).\n",
+        "We will release a notebook soon to show how to build more advanced multimodal pipelines, with a variety of different formats\n",
+        "and also using multimodal embedding models for retrieval.\n",
         "\n",
-        "In the future, you can expect support for more LLM providers, improvements to multimodal indexing and retrieval pipelines, plus the exploration of other interesting directions.\n",
+        "We will also extend multimodal features to more model providers.\n",
         "\n",
         "(*Notebook by [Stefano Fiorucci](https://github.com/anakin87)*)"
       ]