Skip to content

Commit 04485b4

Browse files
authored
Merge pull request #248 from deepset-ai/update-multimodal-notebook
update multimodal intro notebook for 2.16 release
2 parents fc35dd9 + 53f25a1 commit 04485b4

File tree

2 files changed

+17
-145
lines changed

2 files changed

+17
-145
lines changed

index.toml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -321,9 +321,7 @@ topics = ["Function Calling", "Agents"]
321321
title = "Introduction to Multimodal Text Generation"
322322
notebook = "multimodal_intro.ipynb"
323323
new = true
324-
experimental = true
325324
topics = ["Multimodal"]
326-
discuss = "https://github.com/deepset-ai/haystack-experimental/discussions/302"
327325

328326
[[cookbook]]
329327
title = "Build a GitHub PR Creator Agent"

notebooks/multimodal_intro.ipynb

Lines changed: 17 additions & 143 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
"source": [
99
"# 🖼️ Introduction to Multimodal Text Generation\n",
1010
"\n",
11-
"In this notebook, we introduce the experimental features we've developed so far for multimodal text generation in Haystack. The experiment is ongoing, so expect more in the future.\n",
11+
"In this notebook, we introduce the features that enable multimodal text generation in Haystack.\n",
1212
"\n",
1313
"- We introduced the `ImageContent` dataclass, which represents the image content of a user `ChatMessage`.\n",
1414
"- We developed some image converter components.\n",
@@ -35,7 +35,7 @@
3535
},
3636
"outputs": [],
3737
"source": [
38-
"!pip install \"haystack-experimental\" gdown nest_asyncio pillow pypdfium2 python-weather"
38+
"!pip install haystack-ai gdown nest_asyncio pillow pypdfium2 python-weather"
3939
]
4040
},
4141
{
@@ -75,7 +75,7 @@
7575
"source": [
7676
"## Introduction to `ImageContent`\n",
7777
"\n",
78-
"[`ImageContent`](https://github.com/deepset-ai/haystack-experimental/blob/main/haystack_experimental/dataclasses/image_content.py) is a new dataclass that stores the image content of a user `ChatMessage`.\n",
78+
"[`ImageContent`](https://github.com/deepset-ai/haystack/blob/main/haystack/dataclasses/image_content.py) is a new dataclass that stores the image content of a user `ChatMessage`.\n",
7979
"\n",
8080
"It has the following attributes:\n",
8181
"- `base64_image`: A base64 string representing the image.\n",
@@ -129,8 +129,8 @@
129129
},
130130
"outputs": [],
131131
"source": [
132-
"from haystack_experimental.dataclasses import ImageContent, ChatMessage\n",
133-
"from haystack_experimental.components.generators.chat import OpenAIChatGenerator\n",
132+
"from haystack.dataclasses import ImageContent, ChatMessage\n",
133+
"from haystack.components.generators.chat import OpenAIChatGenerator\n",
134134
"import base64\n",
135135
"\n",
136136
"with open(\"capybara.jpg\", \"rb\") as fd:\n",
@@ -364,8 +364,8 @@
364364
"## Image Converters for `ImageContent`\n",
365365
"\n",
366366
"To perform image conversion in multimodal pipelines, we also introduced two image converters:\n",
367-
"- [`ImageFileToImageContent`](https://github.com/deepset-ai/haystack-experimental/blob/main/haystack_experimental/components/converters/image/file_to_image.py), which converts image files to `ImageContent` objects (similar to `from_file_path`).\n",
368-
"- [`PDFToImageContent`](https://github.com/deepset-ai/haystack-experimental/blob/main/haystack_experimental/components/converters/image/pdf_to_image.py), which converts PDF files to `ImageContent` objects."
367+
"- [`ImageFileToImageContent`](https://github.com/deepset-ai/haystack/blob/main/haystack/components/converters/image/file_to_image.py), which converts image files to `ImageContent` objects (similar to `from_file_path`).\n",
368+
"- [`PDFToImageContent`](https://github.com/deepset-ai/haystack/blob/main/haystack/components/converters/image/pdf_to_image.py), which converts PDF files to `ImageContent` objects."
369369
]
370370
},
371371
{
@@ -376,7 +376,7 @@
376376
},
377377
"outputs": [],
378378
"source": [
379-
"from haystack_experimental.components.converters.image import ImageFileToImageContent\n",
379+
"from haystack.components.converters.image import ImageFileToImageContent\n",
380380
"\n",
381381
"converter = ImageFileToImageContent(detail=\"low\", size=(300, 300))\n",
382382
"result = converter.run(sources=[\"capybara.jpg\"])"
@@ -477,7 +477,7 @@
477477
}
478478
],
479479
"source": [
480-
"from haystack_experimental.components.converters.image import PDFToImageContent\n",
480+
"from haystack.components.converters.image import PDFToImageContent\n",
481481
"\n",
482482
"pdf_converter = PDFToImageContent()\n",
483483
"paper_page_image = pdf_converter.run(sources=[\"flan_paper.pdf\"], page_range=\"9\")[\"image_contents\"][0]\n",
@@ -615,7 +615,7 @@
615615
}
616616
],
617617
"source": [
618-
"from haystack_experimental.components.builders import ChatPromptBuilder\n",
618+
"from haystack.components.builders import ChatPromptBuilder\n",
619619
"\n",
620620
"builder = ChatPromptBuilder(template, required_variables=\"*\")\n",
621621
"\n",
@@ -786,8 +786,8 @@
786786
"outputs": [],
787787
"source": [
788788
"from haystack.components.retrievers.in_memory import InMemoryBM25Retriever\n",
789-
"from haystack_experimental.components.generators.chat import OpenAIChatGenerator\n",
790-
"from haystack_experimental.dataclasses import ImageContent, ChatMessage\n",
789+
"from haystack.components.generators.chat import OpenAIChatGenerator\n",
790+
"from haystack.dataclasses import ImageContent, ChatMessage\n",
791791
"\n",
792792
"retriever = InMemoryBM25Retriever(document_store=document_store)\n",
793793
"llm = OpenAIChatGenerator(model=\"gpt-4o-mini\")\n",
@@ -891,133 +891,6 @@
891891
"]"
892892
]
893893
},
894-
{
895-
"cell_type": "markdown",
896-
"metadata": {
897-
"id": "hgPfD4sD0uqW"
898-
},
899-
"source": [
900-
"### Using a Pipeline\n",
901-
"\n",
902-
"The retrieval part of the application showed above can also be implemented using a Pipeline.\n",
903-
"\n",
904-
"As you can see, there are some aspects to improve in terms of developer experience and we will work in this direction."
905-
]
906-
},
907-
{
908-
"cell_type": "code",
909-
"execution_count": null,
910-
"metadata": {
911-
"colab": {
912-
"base_uri": "https://localhost:8080/"
913-
},
914-
"id": "6cFG-YQq0uWg",
915-
"outputId": "4ad910e9-0d14-4f4c-9e2c-85e3051083c3"
916-
},
917-
"outputs": [
918-
{
919-
"data": {
920-
"text/plain": [
921-
"<haystack.core.pipeline.pipeline.Pipeline object at 0x7c607962a550>\n",
922-
"🚅 Components\n",
923-
" - retriever: InMemoryBM25Retriever\n",
924-
" - output_adapter: OutputAdapter\n",
925-
" - image_converter: ImageFileToImageContent\n",
926-
" - prompt_builder: ChatPromptBuilder\n",
927-
" - generator: OpenAIChatGenerator\n",
928-
"🛤️ Connections\n",
929-
" - retriever.documents -> output_adapter.documents (List[Document])\n",
930-
" - output_adapter.output -> image_converter.sources (List[str])\n",
931-
" - image_converter.image_contents -> prompt_builder.image_contents (List[ImageContent])\n",
932-
" - prompt_builder.prompt -> generator.messages (List[ChatMessage])"
933-
]
934-
},
935-
"execution_count": 48,
936-
"metadata": {},
937-
"output_type": "execute_result"
938-
}
939-
],
940-
"source": [
941-
"from typing import List\n",
942-
"\n",
943-
"from haystack import Pipeline\n",
944-
"from haystack.components.converters import OutputAdapter\n",
945-
"from haystack.components.retrievers.in_memory import InMemoryBM25Retriever\n",
946-
"\n",
947-
"from haystack_experimental.components.builders import ChatPromptBuilder\n",
948-
"from haystack_experimental.components.converters.image import ImageFileToImageContent\n",
949-
"from haystack_experimental.components.generators.chat.openai import OpenAIChatGenerator\n",
950-
"\n",
951-
"\n",
952-
"chat_template = \"\"\"\n",
953-
"{% message role=\"user\" %}\n",
954-
"{{query}}\n",
955-
"{% for image_content in image_contents %}\n",
956-
" {{image_content | templatize_part}}\n",
957-
"{% endfor %}\n",
958-
"{% endmessage %}\n",
959-
"\"\"\"\n",
960-
"\n",
961-
"output_adapter_template = \"\"\"\n",
962-
"{%- set paths = [] -%}\n",
963-
"{% for document in documents %}\n",
964-
" {%- set _ = paths.append(document.meta.image_path) -%}\n",
965-
"{% endfor %}\n",
966-
"{{paths}}\n",
967-
"\"\"\"\n",
968-
"\n",
969-
"rag_pipeline = Pipeline()\n",
970-
"\n",
971-
"rag_pipeline.add_component(\"retriever\", InMemoryBM25Retriever(document_store=document_store, top_k=1))\n",
972-
"rag_pipeline.add_component(\"output_adapter\", OutputAdapter(template=output_adapter_template, output_type=List[str]))\n",
973-
"rag_pipeline.add_component(\"image_converter\", ImageFileToImageContent(detail=\"auto\"))\n",
974-
"rag_pipeline.add_component(\"prompt_builder\", ChatPromptBuilder(template=chat_template, required_variables=\"*\"))\n",
975-
"rag_pipeline.add_component(\"generator\", OpenAIChatGenerator(model=\"gpt-4o-mini\"))\n",
976-
"\n",
977-
"rag_pipeline.connect(\"retriever.documents\", \"output_adapter.documents\")\n",
978-
"rag_pipeline.connect(\"output_adapter.output\", \"image_converter.sources\")\n",
979-
"rag_pipeline.connect(\"image_converter.image_contents\", \"prompt_builder.image_contents\")\n",
980-
"rag_pipeline.connect(\"prompt_builder.prompt\", \"generator.messages\")"
981-
]
982-
},
983-
{
984-
"cell_type": "code",
985-
"execution_count": null,
986-
"metadata": {
987-
"colab": {
988-
"base_uri": "https://localhost:8080/"
989-
},
990-
"id": "uX-y-mC038c5",
991-
"outputId": "b1b85e55-8dce-4254-e18f-df4f92c9aeeb"
992-
},
993-
"outputs": [
994-
{
995-
"name": "stdout",
996-
"output_type": "stream",
997-
"text": [
998-
"('The image illustrates how LoRA (Low-Rank Adaptation) systems learn \"intruder '\n",
999-
" 'dimensions\"—singular vectors that differ from those in the pre-trained '\n",
1000-
" 'weight matrix during fine-tuning. \\n'\n",
1001-
" '\\n'\n",
1002-
" '- **Panel (a)** shows the architecture of LoRA and full fine-tuning, '\n",
1003-
" 'emphasizing the addition of learned parameters \\\\( B \\\\) and \\\\( A \\\\) in '\n",
1004-
" 'LoRA.\\n'\n",
1005-
" '- **Panel (b)** compares the cosine similarity of singular vectors from LoRA '\n",
1006-
" \"and full fine-tuning, revealing that LoRA's learned vectors diverge more \"\n",
1007-
" 'from pre-trained weights.\\n'\n",
1008-
" '- **Panel (c)** depicts cosine similarity distributions, highlighting that '\n",
1009-
" 'regular vectors stay consistent while intruder dimensions show significant '\n",
1010-
" 'deviation.')\n"
1011-
]
1012-
}
1013-
],
1014-
"source": [
1015-
"query = \"What the image from the Lora vs Full Fine-tuning paper tries to show? Be short.\"\n",
1016-
"\n",
1017-
"response = rag_pipeline.run(data={\"query\": query})[\"generator\"][\"replies\"][0].text\n",
1018-
"print(response)"
1019-
]
1020-
},
1021894
{
1022895
"cell_type": "markdown",
1023896
"metadata": {
@@ -1046,9 +919,9 @@
1046919
"\n",
1047920
"from haystack.tools import tool\n",
1048921
"from haystack.components.agents import Agent\n",
922+
"from haystack.components.generators.chat import OpenAIChatGenerator\n",
1049923
"\n",
1050-
"from haystack_experimental.dataclasses import ChatMessage, ImageContent\n",
1051-
"from haystack_experimental.components.generators.chat import OpenAIChatGenerator\n",
924+
"from haystack.dataclasses import ChatMessage, ImageContent\n",
1052925
"import python_weather\n",
1053926
"\n",
1054927
"# only needed in Jupyter notebooks where there is an event loop running\n",
@@ -1197,9 +1070,10 @@
11971070
"source": [
11981071
"## What's next?\n",
11991072
"\n",
1200-
"You can follow the progress of the Multimodal experiment in this [GitHub issue](https://github.com/deepset-ai/haystack/issues/8976).\n",
1073+
"We will release a notebook soon to show how to build more advanced multimodal pipelines, with a variety of different formats\n",
1074+
"and also using multimodal embedding models for retrieval.\n",
12011075
"\n",
1202-
"In the future, you can expect support for more LLM providers, improvements to multimodal indexing and retrieval pipelines, plus the exploration of other interesting directions.\n",
1076+
"We will also extend multimodal features to more model providers.\n",
12031077
"\n",
12041078
"(*Notebook by [Stefano Fiorucci](https://github.com/anakin87)*)"
12051079
]

0 commit comments

Comments
 (0)