|
8 | 8 | "source": [
|
9 | 9 | "# 🖼️ Introduction to Multimodal Text Generation\n",
|
10 | 10 | "\n",
|
11 |
| - "In this notebook, we introduce the experimental features we've developed so far for multimodal text generation in Haystack. The experiment is ongoing, so expect more in the future.\n", |
| 11 | + "In this notebook, we introduce the features that enable multimodal text generation in Haystack.\n", |
12 | 12 | "\n",
|
13 | 13 | "- We introduced the `ImageContent` dataclass, which represents the image content of a user `ChatMessage`.\n",
|
14 | 14 | "- We developed some image converter components.\n",
|
|
35 | 35 | },
|
36 | 36 | "outputs": [],
|
37 | 37 | "source": [
|
38 |
| - "!pip install \"haystack-experimental\" gdown nest_asyncio pillow pypdfium2 python-weather" |
| 38 | + "!pip install haystack-ai gdown nest_asyncio pillow pypdfium2 python-weather" |
39 | 39 | ]
|
40 | 40 | },
|
41 | 41 | {
|
|
75 | 75 | "source": [
|
76 | 76 | "## Introduction to `ImageContent`\n",
|
77 | 77 | "\n",
|
78 |
| - "[`ImageContent`](https://github.com/deepset-ai/haystack-experimental/blob/main/haystack_experimental/dataclasses/image_content.py) is a new dataclass that stores the image content of a user `ChatMessage`.\n", |
| 78 | + "[`ImageContent`](https://github.com/deepset-ai/haystack/blob/main/haystack/dataclasses/image_content.py) is a new dataclass that stores the image content of a user `ChatMessage`.\n", |
79 | 79 | "\n",
|
80 | 80 | "It has the following attributes:\n",
|
81 | 81 | "- `base64_image`: A base64 string representing the image.\n",
|
|
129 | 129 | },
|
130 | 130 | "outputs": [],
|
131 | 131 | "source": [
|
132 |
| - "from haystack_experimental.dataclasses import ImageContent, ChatMessage\n", |
133 |
| - "from haystack_experimental.components.generators.chat import OpenAIChatGenerator\n", |
| 132 | + "from haystack.dataclasses import ImageContent, ChatMessage\n", |
| 133 | + "from haystack.components.generators.chat import OpenAIChatGenerator\n", |
134 | 134 | "import base64\n",
|
135 | 135 | "\n",
|
136 | 136 | "with open(\"capybara.jpg\", \"rb\") as fd:\n",
|
|
364 | 364 | "## Image Converters for `ImageContent`\n",
|
365 | 365 | "\n",
|
366 | 366 | "To perform image conversion in multimodal pipelines, we also introduced two image converters:\n",
|
367 |
| - "- [`ImageFileToImageContent`](https://github.com/deepset-ai/haystack-experimental/blob/main/haystack_experimental/components/converters/image/file_to_image.py), which converts image files to `ImageContent` objects (similar to `from_file_path`).\n", |
368 |
| - "- [`PDFToImageContent`](https://github.com/deepset-ai/haystack-experimental/blob/main/haystack_experimental/components/converters/image/pdf_to_image.py), which converts PDF files to `ImageContent` objects." |
| 367 | + "- [`ImageFileToImageContent`](https://github.com/deepset-ai/haystack/blob/main/haystack/components/converters/image/file_to_image.py), which converts image files to `ImageContent` objects (similar to `from_file_path`).\n", |
| 368 | + "- [`PDFToImageContent`](https://github.com/deepset-ai/haystack/blob/main/haystack/components/converters/image/pdf_to_image.py), which converts PDF files to `ImageContent` objects." |
369 | 369 | ]
|
370 | 370 | },
|
371 | 371 | {
|
|
376 | 376 | },
|
377 | 377 | "outputs": [],
|
378 | 378 | "source": [
|
379 |
| - "from haystack_experimental.components.converters.image import ImageFileToImageContent\n", |
| 379 | + "from haystack.components.converters.image import ImageFileToImageContent\n", |
380 | 380 | "\n",
|
381 | 381 | "converter = ImageFileToImageContent(detail=\"low\", size=(300, 300))\n",
|
382 | 382 | "result = converter.run(sources=[\"capybara.jpg\"])"
|
|
477 | 477 | }
|
478 | 478 | ],
|
479 | 479 | "source": [
|
480 |
| - "from haystack_experimental.components.converters.image import PDFToImageContent\n", |
| 480 | + "from haystack.components.converters.image import PDFToImageContent\n", |
481 | 481 | "\n",
|
482 | 482 | "pdf_converter = PDFToImageContent()\n",
|
483 | 483 | "paper_page_image = pdf_converter.run(sources=[\"flan_paper.pdf\"], page_range=\"9\")[\"image_contents\"][0]\n",
|
|
615 | 615 | }
|
616 | 616 | ],
|
617 | 617 | "source": [
|
618 |
| - "from haystack_experimental.components.builders import ChatPromptBuilder\n", |
| 618 | + "from haystack.components.builders import ChatPromptBuilder\n", |
619 | 619 | "\n",
|
620 | 620 | "builder = ChatPromptBuilder(template, required_variables=\"*\")\n",
|
621 | 621 | "\n",
|
|
786 | 786 | "outputs": [],
|
787 | 787 | "source": [
|
788 | 788 | "from haystack.components.retrievers.in_memory import InMemoryBM25Retriever\n",
|
789 |
| - "from haystack_experimental.components.generators.chat import OpenAIChatGenerator\n", |
790 |
| - "from haystack_experimental.dataclasses import ImageContent, ChatMessage\n", |
| 789 | + "from haystack.components.generators.chat import OpenAIChatGenerator\n", |
| 790 | + "from haystack.dataclasses import ImageContent, ChatMessage\n", |
791 | 791 | "\n",
|
792 | 792 | "retriever = InMemoryBM25Retriever(document_store=document_store)\n",
|
793 | 793 | "llm = OpenAIChatGenerator(model=\"gpt-4o-mini\")\n",
|
|
891 | 891 | "]"
|
892 | 892 | ]
|
893 | 893 | },
|
894 |
| - { |
895 |
| - "cell_type": "markdown", |
896 |
| - "metadata": { |
897 |
| - "id": "hgPfD4sD0uqW" |
898 |
| - }, |
899 |
| - "source": [ |
900 |
| - "### Using a Pipeline\n", |
901 |
| - "\n", |
902 |
| - "The retrieval part of the application showed above can also be implemented using a Pipeline.\n", |
903 |
| - "\n", |
904 |
| - "As you can see, there are some aspects to improve in terms of developer experience and we will work in this direction." |
905 |
| - ] |
906 |
| - }, |
907 |
| - { |
908 |
| - "cell_type": "code", |
909 |
| - "execution_count": null, |
910 |
| - "metadata": { |
911 |
| - "colab": { |
912 |
| - "base_uri": "https://localhost:8080/" |
913 |
| - }, |
914 |
| - "id": "6cFG-YQq0uWg", |
915 |
| - "outputId": "4ad910e9-0d14-4f4c-9e2c-85e3051083c3" |
916 |
| - }, |
917 |
| - "outputs": [ |
918 |
| - { |
919 |
| - "data": { |
920 |
| - "text/plain": [ |
921 |
| - "<haystack.core.pipeline.pipeline.Pipeline object at 0x7c607962a550>\n", |
922 |
| - "🚅 Components\n", |
923 |
| - " - retriever: InMemoryBM25Retriever\n", |
924 |
| - " - output_adapter: OutputAdapter\n", |
925 |
| - " - image_converter: ImageFileToImageContent\n", |
926 |
| - " - prompt_builder: ChatPromptBuilder\n", |
927 |
| - " - generator: OpenAIChatGenerator\n", |
928 |
| - "🛤️ Connections\n", |
929 |
| - " - retriever.documents -> output_adapter.documents (List[Document])\n", |
930 |
| - " - output_adapter.output -> image_converter.sources (List[str])\n", |
931 |
| - " - image_converter.image_contents -> prompt_builder.image_contents (List[ImageContent])\n", |
932 |
| - " - prompt_builder.prompt -> generator.messages (List[ChatMessage])" |
933 |
| - ] |
934 |
| - }, |
935 |
| - "execution_count": 48, |
936 |
| - "metadata": {}, |
937 |
| - "output_type": "execute_result" |
938 |
| - } |
939 |
| - ], |
940 |
| - "source": [ |
941 |
| - "from typing import List\n", |
942 |
| - "\n", |
943 |
| - "from haystack import Pipeline\n", |
944 |
| - "from haystack.components.converters import OutputAdapter\n", |
945 |
| - "from haystack.components.retrievers.in_memory import InMemoryBM25Retriever\n", |
946 |
| - "\n", |
947 |
| - "from haystack_experimental.components.builders import ChatPromptBuilder\n", |
948 |
| - "from haystack_experimental.components.converters.image import ImageFileToImageContent\n", |
949 |
| - "from haystack_experimental.components.generators.chat.openai import OpenAIChatGenerator\n", |
950 |
| - "\n", |
951 |
| - "\n", |
952 |
| - "chat_template = \"\"\"\n", |
953 |
| - "{% message role=\"user\" %}\n", |
954 |
| - "{{query}}\n", |
955 |
| - "{% for image_content in image_contents %}\n", |
956 |
| - " {{image_content | templatize_part}}\n", |
957 |
| - "{% endfor %}\n", |
958 |
| - "{% endmessage %}\n", |
959 |
| - "\"\"\"\n", |
960 |
| - "\n", |
961 |
| - "output_adapter_template = \"\"\"\n", |
962 |
| - "{%- set paths = [] -%}\n", |
963 |
| - "{% for document in documents %}\n", |
964 |
| - " {%- set _ = paths.append(document.meta.image_path) -%}\n", |
965 |
| - "{% endfor %}\n", |
966 |
| - "{{paths}}\n", |
967 |
| - "\"\"\"\n", |
968 |
| - "\n", |
969 |
| - "rag_pipeline = Pipeline()\n", |
970 |
| - "\n", |
971 |
| - "rag_pipeline.add_component(\"retriever\", InMemoryBM25Retriever(document_store=document_store, top_k=1))\n", |
972 |
| - "rag_pipeline.add_component(\"output_adapter\", OutputAdapter(template=output_adapter_template, output_type=List[str]))\n", |
973 |
| - "rag_pipeline.add_component(\"image_converter\", ImageFileToImageContent(detail=\"auto\"))\n", |
974 |
| - "rag_pipeline.add_component(\"prompt_builder\", ChatPromptBuilder(template=chat_template, required_variables=\"*\"))\n", |
975 |
| - "rag_pipeline.add_component(\"generator\", OpenAIChatGenerator(model=\"gpt-4o-mini\"))\n", |
976 |
| - "\n", |
977 |
| - "rag_pipeline.connect(\"retriever.documents\", \"output_adapter.documents\")\n", |
978 |
| - "rag_pipeline.connect(\"output_adapter.output\", \"image_converter.sources\")\n", |
979 |
| - "rag_pipeline.connect(\"image_converter.image_contents\", \"prompt_builder.image_contents\")\n", |
980 |
| - "rag_pipeline.connect(\"prompt_builder.prompt\", \"generator.messages\")" |
981 |
| - ] |
982 |
| - }, |
983 |
| - { |
984 |
| - "cell_type": "code", |
985 |
| - "execution_count": null, |
986 |
| - "metadata": { |
987 |
| - "colab": { |
988 |
| - "base_uri": "https://localhost:8080/" |
989 |
| - }, |
990 |
| - "id": "uX-y-mC038c5", |
991 |
| - "outputId": "b1b85e55-8dce-4254-e18f-df4f92c9aeeb" |
992 |
| - }, |
993 |
| - "outputs": [ |
994 |
| - { |
995 |
| - "name": "stdout", |
996 |
| - "output_type": "stream", |
997 |
| - "text": [ |
998 |
| - "('The image illustrates how LoRA (Low-Rank Adaptation) systems learn \"intruder '\n", |
999 |
| - " 'dimensions\"—singular vectors that differ from those in the pre-trained '\n", |
1000 |
| - " 'weight matrix during fine-tuning. \\n'\n", |
1001 |
| - " '\\n'\n", |
1002 |
| - " '- **Panel (a)** shows the architecture of LoRA and full fine-tuning, '\n", |
1003 |
| - " 'emphasizing the addition of learned parameters \\\\( B \\\\) and \\\\( A \\\\) in '\n", |
1004 |
| - " 'LoRA.\\n'\n", |
1005 |
| - " '- **Panel (b)** compares the cosine similarity of singular vectors from LoRA '\n", |
1006 |
| - " \"and full fine-tuning, revealing that LoRA's learned vectors diverge more \"\n", |
1007 |
| - " 'from pre-trained weights.\\n'\n", |
1008 |
| - " '- **Panel (c)** depicts cosine similarity distributions, highlighting that '\n", |
1009 |
| - " 'regular vectors stay consistent while intruder dimensions show significant '\n", |
1010 |
| - " 'deviation.')\n" |
1011 |
| - ] |
1012 |
| - } |
1013 |
| - ], |
1014 |
| - "source": [ |
1015 |
| - "query = \"What the image from the Lora vs Full Fine-tuning paper tries to show? Be short.\"\n", |
1016 |
| - "\n", |
1017 |
| - "response = rag_pipeline.run(data={\"query\": query})[\"generator\"][\"replies\"][0].text\n", |
1018 |
| - "print(response)" |
1019 |
| - ] |
1020 |
| - }, |
1021 | 894 | {
|
1022 | 895 | "cell_type": "markdown",
|
1023 | 896 | "metadata": {
|
|
1046 | 919 | "\n",
|
1047 | 920 | "from haystack.tools import tool\n",
|
1048 | 921 | "from haystack.components.agents import Agent\n",
|
| 922 | + "from haystack.components.generators.chat import OpenAIChatGenerator\n", |
1049 | 923 | "\n",
|
1050 |
| - "from haystack_experimental.dataclasses import ChatMessage, ImageContent\n", |
1051 |
| - "from haystack_experimental.components.generators.chat import OpenAIChatGenerator\n", |
| 924 | + "from haystack.dataclasses import ChatMessage, ImageContent\n", |
1052 | 925 | "import python_weather\n",
|
1053 | 926 | "\n",
|
1054 | 927 | "# only needed in Jupyter notebooks where there is an event loop running\n",
|
|
1197 | 1070 | "source": [
|
1198 | 1071 | "## What's next?\n",
|
1199 | 1072 | "\n",
|
1200 |
| - "You can follow the progress of the Multimodal experiment in this [GitHub issue](https://github.com/deepset-ai/haystack/issues/8976).\n", |
| 1073 | + "We will release a notebook soon to show how to build more advanced multimodal pipelines, with a variety of different formats\n", |
| 1074 | + "and also using multimodal embedding models for retrieval.\n", |
1201 | 1075 | "\n",
|
1202 |
| - "In the future, you can expect support for more LLM providers, improvements to multimodal indexing and retrieval pipelines, plus the exploration of other interesting directions.\n", |
| 1076 | + "We will also extend multimodal features to more model providers.\n", |
1203 | 1077 | "\n",
|
1204 | 1078 | "(*Notebook by [Stefano Fiorucci](https://github.com/anakin87)*)"
|
1205 | 1079 | ]
|
|
0 commit comments