Skip to content

Releases: deepset-ai/haystack

v2.6.0-rc3

01 Oct 15:29
cd23720
Compare
Choose a tag to compare
v2.6.0-rc3 Pre-release
Pre-release

Release Notes

⬆️ Upgrade Notes

  • gpt-3.5-turbo was replaced by gpt-4o-mini as the default model for all components relying on OpenAI API
  • The legacy filter syntax support has been completely removed. Users need to use the new filter syntax. See the docs for more details.

🚀 New Features

  • Added a new component DocumentNDCGEvaluator, which is similar to DocumentMRREvaluator and useful for retrieval evaluation. It calculates the normalized discounted cumulative gain, an evaluation metric useful when there are multiple ground truth relevant documents and the order in which they are retrieved is important.

  • Add new CSVToDocument component. Loads the file as bytes object. Adds the loaded string as a new document that can be used for further processing by the Document Splitter.

  • Adds support for zero shot document classification via new TransformersZeroShotDocumentClassifier component. This allows you to classify documents into user-defined classes (binary and multi-label classification) using pre-trained models from Hugging Face.

  • Added the option to use a custom splitting function in DocumentSplitter. The function must accept a string as input and return a list of strings, representing the split units. To use the feature initialise DocumentSplitter with split_by="function" providing the custom splitting function as splitting_function=custom_function.

  • Add new JSONConverter Component to convert JSON files to Document. Optionally it can use jq to filter the source JSON files and extract only specific parts.

import json  
from haystack.components.converters import JSONConverter 
from haystack.dataclasses import ByteStream  
data = {
  "laureates": [
    {
      "firstname": "Enrico",
      "surname": "Fermi",
      "motivation": "for his demonstrations of the existence of new radioactive elements produced "
      "by neutron irradiation, and for his related discovery of nuclear reactions brought about by slow neutrons",
    },
    {
      "firstname": "Rita",
      "surname": "Levi-Montalcini",
      "motivation": "for their discoveries of growth factors",
    },
  ],
} 
source = ByteStream.from_string(json.dumps(data)) 
converter = JSONConverter(jq_schema=".laureates[]", content_key="motivation", extra_meta_fields=["firstname", "surname"])  
results = converter.run(sources=[source]) 
documents = results["documents"] print(documents[0].content) 
# 'for his demonstrations of the existence of new radioactive elements produced by 
# neutron irradiation, and for his related discovery of nuclear reactions brought 
# about by slow neutrons' 
print(documents[0].meta)
# {'firstname': 'Enrico', 'surname': 'Fermi'} 
print(documents[1].content)
# 'for their discoveries of growth factors'  print(documents[1].meta) # {'firstname': 'Rita', 'surname': 'Levi-Montalcini'}
  • Added a new NLTKDocumentSplitter, a component enhancing document preprocessing capabilities with NLTK. This feature allows for fine-grained control over the splitting of documents into smaller parts based on configurable criteria such as word count, sentence boundaries, and page breaks. It supports multiple languages and offers options for handling sentence boundaries and abbreviations, facilitating better handling of various document types for further processing tasks.

  • Updates SentenceTransformersDocumentEmbedder and SentenceTransformersTextEmbedder so model_max_length passed through tokenizer_kwargs also updates the max_seq_length of the underlying SentenceTransformer model.

⚡️ Enhancement Notes

  • Adapts how ChatPromptBuilder creates ChatMessages. Messages are deep copied to ensure all meta fields are copied correctly.

  • Expose default_headers to pass custom headers to Azure API including APIM subscription key.

  • Add optional azure_kwargs dictionary parameter to pass in parameters undefined in Haystack but supported by AzureOpenAI.

  • Allow the ability to add the current date inside a template in PromptBuilder using the following syntax:

    • {% now 'UTC' %}: Get the current date for the UTC timezone.
    • {% now 'America/Chicago' + 'hours=2' %}: Add two hours to the current date in the Chicago timezone.
    • {% now 'Europe/Berlin' - 'weeks=2' %}: Subtract two weeks from the current date in the Berlin timezone.
    • {% now 'Pacific/Fiji' + 'hours=2', '%H' %}: Display only the number of hours after adding two hours to the Fiji timezone.
    • {% now 'Etc/GMT-4', '%I:%M %p' %}: Change the date format to AM/PM for the GMT-4 timezone.

    Note that if no date format is provided, the default will be %Y-%m-%d %H:%M:%S. Please refer to list of tz database for a list of timezones.

  • Adds usage meta field with prompt_tokens and completion_tokens keys to HuggingFaceAPIChatGenerator.

  • Add new GreedyVariadic input type. This has a similar behaviour to Variadic input type as it can be connected to multiple output sockets, though the Pipeline will run it as soon as it receives an input without waiting for others. This replaces the is_greedy argument in the @component decorator. If you had a Component with a Variadic input type and @component(is_greedy=True) you need to change the type to GreedyVariadic and remove is_greedy=true from @component.

  • Add new Pipeline init argument max_runs_per_component, this has the same identical behaviour as the existing max_loops_allowed argument but is more descriptive of its actual effects.

  • Add new PipelineMaxLoops to reflect new max_runs_per_component init argument

  • We added batching during inference time to the TransformerSimilarityRanker to help prevent OOMs when ranking large amounts of Documents.

⚠️ Deprecation Notes

  • The DefaultConverter class used by the PyPDFToDocument component has been deprecated. Its functionality will be merged into the component in 2.7.0.
  • Pipeline init argument debug_path is deprecated and will be removed in version 2.7.0.
  • @component decorator is_greedy argument is deprecated and will be removed in version 2.7.0. Use GreedyVariadic type instead.
  • Deprecate connecting a Component to itself when calling Pipeline.connect(), it will raise an error from version 2.7.0 onwards
  • Pipeline init argument max_loops_allowed is deprecated and will be removed in version 2.7.0. Use max_runs_per_component instead.
  • PipelineMaxLoops exception is deprecated and will be removed in version 2.7.0. Use PipelineMaxComponentRuns instead.

🐛 Bug Fixes

  • Fix the serialization of PyPDFToDocument component to prevent the default converter from being serialized unnecessarily.
  • Add constraints to component.set_input_type and component.set_input_types to prevent undefined behaviour when the run method does not contain a variadic keyword argument.
  • Prevent set_output_types from being called when the output_types decorator is used.
  • Update the CHAT_WITH_WEBSITE Pipeline template to reflect the changes in the HTMLToDocument converter component.
  • Fix a Pipeline visualization issue due to changes in the new release of Mermaid.
  • Fixing the filters in the SentenceWindowRetriever allowing now support for 3 more DocumentStores: Astra, PGVector, Qdrant
  • Fix Pipeline not running Components with Variadic input even if it received inputs only from a subset of its senders
  • The from_dict method of ConditionalRouter now correctly handles the case where the dict passed to it contains the key custom_filters explicitly set to None. Previously this was causing an AttributeError
  • Make the from_dict method of the PyPDFToDocument more robust to cases when the converter is not provided in the dictionary.

v2.6.0-rc2

01 Oct 05:45
57f43be
Compare
Choose a tag to compare
v2.6.0-rc2 Pre-release
Pre-release

Release Notes

⬆️ Upgrade Notes

  • gpt-3.5-turbo was replaced by gpt-4o-mini as the default model for all components relying on OpenAI API
  • The legacy filter syntax support has been completely removed. Users need to use the new filter syntax. See the docs for more details.

🚀 New Features

  • Add new CSVToDocument component. Loads the file as bytes object. Adds the loaded string as a new document that can be used for further processing by the Document Splitter.

  • Adds support for zero shot document classification via new TransformersZeroShotDocumentClassifier component. This allows you to classify documents into user-defined classes (binary and multi-label classification) using pre-trained models from Hugging Face.

  • Added the option to use a custom splitting function in DocumentSplitter. The function must accept a string as input and return a list of strings, representing the split units. To use the feature initialise DocumentSplitter with split_by="function" providing the custom splitting function as splitting_function=custom_function.

  • Add new JSONConverter Component to convert JSON files to Document. Optionally it can use jq to filter the source JSON files and extract only specific parts.

import json  
from haystack.components.converters import JSONConverter 
from haystack.dataclasses import ByteStream  
data = {
  "laureates": [
    {
      "firstname": "Enrico",
      "surname": "Fermi",
      "motivation": "for his demonstrations of the existence of new radioactive elements produced "
      "by neutron irradiation, and for his related discovery of nuclear reactions brought about by slow neutrons",
    },
    {
      "firstname": "Rita",
      "surname": "Levi-Montalcini",
      "motivation": "for their discoveries of growth factors",
    },
  ],
} 
source = ByteStream.from_string(json.dumps(data)) 
converter = JSONConverter(jq_schema=".laureates[]", content_key="motivation", extra_meta_fields=["firstname", "surname"])  
results = converter.run(sources=[source]) 
documents = results["documents"] print(documents[0].content) 
# 'for his demonstrations of the existence of new radioactive elements produced by 
# neutron irradiation, and for his related discovery of nuclear reactions brought 
# about by slow neutrons' 
print(documents[0].meta)
# {'firstname': 'Enrico', 'surname': 'Fermi'} 
print(documents[1].content)
# 'for their discoveries of growth factors'  print(documents[1].meta) # {'firstname': 'Rita', 'surname': 'Levi-Montalcini'}
  • Added a new NLTKDocumentSplitter, a component enhancing document preprocessing capabilities with NLTK. This feature allows for fine-grained control over the splitting of documents into smaller parts based on configurable criteria such as word count, sentence boundaries, and page breaks. It supports multiple languages and offers options for handling sentence boundaries and abbreviations, facilitating better handling of various document types for further processing tasks.

  • Updates SentenceTransformersDocumentEmbedder and SentenceTransformersTextEmbedder so model_max_length passed through tokenizer_kwargs also updates the max_seq_length of the underlying SentenceTransformer model.

⚡️ Enhancement Notes

  • Adapts how ChatPromptBuilder creates ChatMessages. Messages are deep copied to ensure all meta fields are copied correctly.

  • Expose default_headers to pass custom headers to Azure API including APIM subscription key.

  • Add optional azure_kwargs dictionary parameter to pass in parameters undefined in Haystack but supported by AzureOpenAI.

  • Allow the ability to add the current date inside a template in PromptBuilder using the following syntax:

    • {% now 'UTC' %}: Get the current date for the UTC timezone.
    • {% now 'America/Chicago' + 'hours=2' %}: Add two hours to the current date in the Chicago timezone.
    • {% now 'Europe/Berlin' - 'weeks=2' %}: Subtract two weeks from the current date in the Berlin timezone.
    • {% now 'Pacific/Fiji' + 'hours=2', '%H' %}: Display only the number of hours after adding two hours to the Fiji timezone.
    • {% now 'Etc/GMT-4', '%I:%M %p' %}: Change the date format to AM/PM for the GMT-4 timezone.

    Note that if no date format is provided, the default will be %Y-%m-%d %H:%M:%S. Please refer to list of tz database for a list of timezones.

  • Adds usage meta field with prompt_tokens and completion_tokens keys to HuggingFaceAPIChatGenerator.

  • Add new GreedyVariadic input type. This has a similar behaviour to Variadic input type as it can be connected to multiple output sockets, though the Pipeline will run it as soon as it receives an input without waiting for others. This replaces the is_greedy argument in the @component decorator. If you had a Component with a Variadic input type and @component(is_greedy=True) you need to change the type to GreedyVariadic and remove is_greedy=true from @component.

  • Add new Pipeline init argument max_runs_per_component, this has the same identical behaviour as the existing max_loops_allowed argument but is more descriptive of its actual effects.

  • Add new PipelineMaxLoops to reflect new max_runs_per_component init argument

  • We added batching during inference time to the TransformerSimilarityRanker to help prevent OOMs when ranking large amounts of Documents.

⚠️ Deprecation Notes

  • Pipeline init argument debug_path is deprecated and will be removed in version 2.7.0.
  • @component decorator is_greedy argument is deprecated and will be removed in version 2.7.0. Use GreedyVariadic type instead.
  • Deprecate connecting a Component to itself when calling Pipeline.connect(), it will raise an error from version 2.7.0 onwards
  • Pipeline init argument max_loops_allowed is deprecated and will be removed in version 2.7.0. Use max_runs_per_component instead.
  • PipelineMaxLoops exception is deprecated and will be removed in version 2.7.0. Use PipelineMaxComponentRuns instead.

🐛 Bug Fixes

  • Add constraints to component.set_input_type and component.set_input_types to prevent undefined behaviour when the run method does not contain a variadic keyword argument.
  • Prevent set_output_types from being called when the output_types decorator is used.
  • Update the CHAT_WITH_WEBSITE Pipeline template to reflect the changes in the HTMLToDocument converter component.
  • Fix a Pipeline visualization issue due to changes in the new release of Mermaid.
  • Fixing the filters in the SentenceWindowRetriever allowing now support for 3 more DocumentStores: Astra, PGVector, Qdrant
  • Fix Pipeline not running Components with Variadic input even if it received inputs only from a subset of its senders
  • The from_dict method of ConditionalRouter now correctly handles the case where the dict passed to it contains the key custom_filters explicitly set to None. Previously this was causing an AttributeError
  • Make the from_dict method of the PyPDFToDocument more robust to cases when the converter is not provided in the dictionary.

v2.5.1

10 Sep 14:08
Compare
Choose a tag to compare

Release Notes

⚡️ Enhancement Notes

  • Add default_headers init argument to AzureOpenAIGenerator and AzureOpenAIChatGenerator

🐛 Bug Fixes

  • Fix the Pipeline visualization issue due to changes in the new release of Mermaid
  • Fix Pipeline not running Components with Variadic input even if it received inputs only from a subset of its senders
  • The from_dict method of ConditionalRouter now correctly handles the case where the dict passed to it contains the key custom_filters explicitly set to None. Previously this was causing an AttributeError

v2.5.1-rc2

10 Sep 13:25
Compare
Choose a tag to compare
v2.5.1-rc2 Pre-release
Pre-release

Release Notes

⚡️ Enhancement Notes

  • Add default_headers init argument to AzureOpenAIGenerator and AzureOpenAIChatGenerator

🐛 Bug Fixes

  • Fix the Pipeline visualization issue due to changes in the new release of Mermaid
  • Fix Pipeline not running Components with Variadic input even if it received inputs only from a subset of its senders
  • The from_dict method of ConditionalRouter now correctly handles the case where the dict passed to it contains the key custom_filters explicitly set to None. Previously this was causing an AttributeError

v2.5.1-rc1

10 Sep 13:06
Compare
Choose a tag to compare
v2.5.1-rc1 Pre-release
Pre-release

Release Notes

⚡️ Enhancement Notes

  • Add default_headers init argument to AzureOpenAIGenerator and AzureOpenAIChatGenerator

🐛 Bug Fixes

  • Fix Pipeline not running Components with Variadic input even if it received inputs only from a subset of its senders
  • The from_dict method of ConditionalRouter now correctly handles the case where the dict passed to it contains the key custom_filters explicitly set to None. Previously this was causing an AttributeError

v2.5.0

04 Sep 14:04
Compare
Choose a tag to compare

Release Notes

⬆️ Upgrade Notes

  • Removed ChatMessage.to_openai_format method. Use haystack.components.generators.openai_utils._convert_message_to_openai_format instead.
  • Removed unused debug parameter from Pipeline.run method.
  • Removed deprecated SentenceWindowRetrieval. Use SentenceWindowRetriever instead.

🚀 New Features

  • Added the unsafe argument to enable behavior that could lead to remote code execution in ConditionalRouter and OutputAdapter. By default, unsafe behavior is disabled, and users must explicitly set unsafe=True to enable it. When unsafe is enabled, types such as ChatMessage, Document, and Answer can be used as output types. We recommend enabling unsafe behavior only when the Jinja template source is trusted. For more information, see the documentation for ConditionalRouter and OutputAdapter.

⚡️ Enhancement Notes

  • Adapts how ChatPromptBuilder creates ChatMessages. Messages are deep copied to ensure all meta fields are copied correctly.
  • The parameter, min_top_k, has been added to the TopPSampler. This parameter sets the minimum number of documents to be returned when the top-p sampling algorithm selects fewer documents than desired. Documents with the next highest scores are added to meet the minimum. This is useful when guaranteeing a set number of documents to pass through while still allowing the Top-P algorithm to determine if more documents should be sent based on scores.
  • Introduced a utility function to deserialize a generic Document Store from the init_parameters of a serialized component.
  • Refactor deserialize_document_store_in_init_parameters to clarify that the function operates in place and does not return a value.
  • The SentenceWindowRetriever now returns context_documents as well as the context_windows for each Document in retrieved_documents . This allows you to get a list of Documents from within the context window for each retrieved document.

⚠️ Deprecation Notes

  • The default model for OpenAIGenerator and OpenAIChatGenerator, previously 'gpt-3.5-turbo', will be replaced by 'gpt-4o-mini'.

🐛 Bug Fixes

  • Fixed an issue where page breaks were not being extracted from DOCX files.
  • Used a forward reference for the Paragraph class in the DOCXToDocument converter to prevent import errors.
  • The metadata produced by DOCXToDocument component is now JSON serializable. Previously, it contained datetime objects automatically extracted from DOCX files, which are not JSON serializable. These datetime objects are now converted to strings.
  • Starting from haystack-ai==2.4.0, Haystack is compatible with sentence-transformers>=3.0.0; earlier versions of sentence-transformers are not supported. We have updated the test dependencies and LazyImport messages to reflect this change.
  • For components that support multiple Document Stores, prioritize using the specific from_dict class method for deserialization when available. Otherwise, fall back to the generic default_from_dict method. This impacts the following generic components: CacheChecker, DocumentWriter, FilterRetriever, and SentenceWindowRetriever.

v2.5.0-rc3

04 Sep 09:51
Compare
Choose a tag to compare
v2.5.0-rc3 Pre-release
Pre-release

Release Notes

Enhancement Notes

  • Adapts how ChatPromptBuilder creates ChatMessages. Messages are deep copied to ensure all meta fields are copied correctly.

v2.5.0-rc2

02 Sep 15:22
Compare
Choose a tag to compare
v2.5.0-rc2 Pre-release
Pre-release

Release Notes

Upgrade Notes

  • Remove ChatMessage.to_openai_format method. Use haystack.components.generators.openai_utils._convert_message_to_openai_format instead.
  • Remove unused debug parameter from Pipeline.run method.
  • Removing deprecated SentenceWindowRetrieval, replaced by SentenceWindowRetriever

New Features

  • Add unsafe argument to enable behaviour that could lead to remote code execution in ConditionalRouter and OutputAdapter. By default unsafe behaviour is not enabled, the user must set it explicitly to True. This means that user types like ChatMessage, Document, and Answer can be used as output types when unsafe is True. We recommend using unsafe behaviour only when the Jinja templates source is trusted. For more info see the documentation for ConditionalRouter and OutputAdapter

Enhancement Notes

  • The parameter min_top_k is added to the TopPSampler which sets the minimum number of documents to be returned when the top-p sampling algorithm results in fewer documents being selected. The documents with the next highest scores are added to the selection. This is useful when we want to guarantee a set number of documents will always be passed on, but allow the Top-P algorithm to still determine if more documents should be sent based on document score.
  • Introduce an utility function to deserialize a generic Document Store from the init_parameters of a serialized component.
  • Refactor deserialize_document_store_in_init_parameters so that new function name indicates that the operation occurs in place, with no return value.
  • The SentenceWindowRetriever has now an extra output key containing all the documents belonging to the context window.

Deprecation Notes

  • SentenceWindowRetrieval is deprecated and will be removed in future. Use SentenceWindowRetriever instead.
  • The 'gpt-3.5-turbo' as the default model for the OpenAIGenerator and OpenAIChatGenerator will be replaced by 'gpt-4o-mini'.

Bug Fixes

  • Fixed an issue where page breaks were not being extracted from DOCX files.
  • Use a forward reference for the Paragraph class in the DOCXToDocument converter to prevent import errors.
  • The metadata produced by DOCXToDocument component is now JSON serializable. Previously, it contained datetime objects automatically extracted from DOCX files, which are not JSON serializable. Now, the datetime objects are converted to strings.
  • Starting from haystack-ai==2.4.0, Haystack is compatible with sentence-transformers>=3.0.0; earlier versions of sentence-transformers are not supported. We are updating the test dependency and the LazyImport messages to reflect that.
  • For components that support multiple Document Stores, prioritize using the specific from_dict class method for deserialization when available. Otherwise, fall back to the generic default_from_dict method. This impacts the following generic components: CacheChecker, DocumentWriter, FilterRetriever, and SentenceWindowRetriever.

v1.26.3

29 Aug 14:00
Compare
Choose a tag to compare

Release Notes

v1.26.3

⬆️ Upgrade Notes

  • Upgrades ntlk to 3.9.1 as prior versions are affect by https://nvd.nist.gov/vuln/detail/CVE-2024-39705. Due to these security vulnerabilities, it is not possible to use custom NLTK tokenizer models with the new version (for example in PreProcessor). Users can still use built-in nltk tokenizers by specifying the language parameter in the PreProcessor. See PreProcessor documentation for more details.

⚡️ Enhancement Notes

  • Pins sentence-transformers<=3.0.0,>=2.3.1 and python-pptx<=1.0 to avoid some minor typing incompatibilities with the newer version of the respective libraries.

🐛 Bug Fixes

v2.4.0

15 Aug 09:39
8dd610a
Compare
Choose a tag to compare

Release Notes

v2.4.0

Highlights

🙌 Local LLMs and custom generation parameters in evaluation

The new api_params init parameter added to LLM-based evaluators such as ContextRelevanceEvaluator and FaithfulnessEvaluator can be used to pass in supported OpenAIGenerator parameters, allowing for custom generation parameters (via generation_kwargs) and local LLM support (via api_base_url).

📝 New Joiner

New AnswerJoiner component to combine multiple lists of Answers.

⬆️ Upgrade Notes

  • The ContextRelevanceEvaluator now returns a list of relevant sentences for each context, instead of all the sentences in a context. Also, a score of 1 is now returned if a relevant sentence is found, and 0 otherwise.
  • Removed the deprecated DynamicPromptBuilder and DynamicChatPromptBuilder components. Use PromptBuilder and ChatPromptBuilder instead.
  • OutputAdapter and ConditionalRouter can't return users inputs anymore.
  • Multiplexer is removed and users should switch to BranchJoiner instead.
  • Removed deprecated init parameters extractor_type and try_others from HTMLToDocument.
  • SentenceWindowRetrieval component has been renamed to SenetenceWindowRetriever.
  • The serialize_callback_handler and deserialize_callback_handler utility functions have been removed. Use serialize_callable and deserialize_callable instead. For more information on serialize_callable and deserialize_callable, see the API reference: https://docs.haystack.deepset.ai/reference/utils-api#module-callable_serialization

🚀 New Features

  • LLM based evaluators can pass in supported OpenAIGenerator parameters via api_params. This allows for custom generation_kwargs, changing the api_base_url (for local evaluation), and all other supported parameters as described in the OpenAIGenerator docs.
  • Introduced a new AnswerJoiner component that allows joining multiple lists of Answers into a single list using the Concatenate join mode.
  • Add truncate_dim parameter to Sentence Transformers Embedders, which allows truncating embeddings. Especially useful for models trained with Matryoshka Representation Learning.
  • Add precision parameter to Sentence Transformers Embedders, which allows quantized embeddings. Especially useful for reducing the size of the embeddings of a corpus for semantic search, among other tasks.

⚡️ Enhancement Notes

  • Adds model_kwargs and tokenizer_kwargs to the components TransformersSimilarityRanker, SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder. This allows passing things like model_max_length or torch_dtype for better management of model inference.
  • Added unicode_normalization parameter to the DocumentCleaner, allowing to normalize the text to NFC, NFD, NFKC, or NFKD.
  • Added ascii_only parameter to the DocumentCleaner, transforming letters with diacritics to their ASCII equivalent and removing other non-ASCII characters.
  • Improved error messages for deserialization errors.
  • TikaDocumentConverter now returns page breaks ("f") in the output. This only works for PDF files.
  • Enhanced filter application logic to support merging of filters. It facilitates more precise retrieval filtering, allowing for both init and runtime complex filter combinations with logical operators. For more details see https://docs.haystack.deepset.ai/docs/metadata-filtering
  • The streaming_callback parameter can be passed to OpenAIGenerator and OpenAIChatGenerator during pipeline run. This prevents the need to recreate pipelines for streaming callbacks.
  • Add max_retries and timeout parameters to the AzureOpenAIChatGenerator initializations.
  • Document Python 3.11 and 3.12 support in project configuration.
  • Refactor DocumentJoiner to use enum pattern for the 'join_mode' parameter instead of bare string.
  • Add max_retries, timeout parameters to the AzureOpenAIDocumentEmbedder initialization.
  • Add max_retries and timeout parameters to the AzureOpenAITextEmbedder initializations.
  • Introduce an utility function to deserialize a generic Document Store from the init_parameters of a serialized component.

⚠️ Deprecation Notes

  • Haystack 1.x legacy filters are deprecated and will be removed in a future release. Please use the new filter style as described in the documentation - https://docs.haystack.deepset.ai/docs/metadata-filtering
  • Deprecate the method to_openai_format of the ChatMessage dataclass. This method was never intended to be public and was only used internally. Now, each Chat Generator will know internally how to convert the messages to the format of their specific provider.
  • Deprecate the unused debug parameter in the Pipeline.run method.
  • SentenceWindowRetrieval is deprecated and will be removed in future. Use SentenceWindowRetriever instead.

Security Notes

  • Fix issue that could lead to remote code execution when using insecure Jinja template in the following Components:

    • PromptBuilder
    • ChatPromptBuilder
    • OutputAdapter
    • ConditionalRouter

    The same issue has been fixed in the PipelineTemplate class too.

🐛 Bug Fixes

  • Fix ChatPromptBuilder from_dict method when template value is None.
  • Fix the DocumentCleaner removing the f tag from content preventing from counting page number (by Splitter for example).
  • The DocumentSplitter was incorrectly calculating the split_start_idx and _split_overlap information due to slight miscalculations of appropriate indices. This fixes those so the split_start_idx and _split_overlap information is correct.
  • Fix bug in Pipeline.run() executing Components in a wrong and unexpected order
  • Encoding of HTML files in LinkContentFetcher
  • Fix Output Adapter from_dict method when custom_filters value is None.
  • Prevent Pipeline.from_dict from modifying the dictionary parameter passed to it.
  • Fix a bug in Pipeline.run() that would cause it to get stuck in an infinite loop and never return. This was caused by Components waiting forever for their inputs when parts of the Pipeline graph are skipped cause of a "decision" Component not returning outputs for that side of the Pipeline.
  • This updates the components, TransformersSimilarityRanker, SentenceTransformersDiversityRanker, SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder and LocalWhisperTranscriber from_dict methods to work when loading with init_parameters only containing required parameters.
  • Pins structlog to <= 24.2.0 to avoid some unit test failures. This is a temporary fix until we can upgrade tests to a newer versions of structlog.
  • Correctly expose PPTXToDocument component in haystack namespace.
  • Fix TransformersZeroShotTextRouter and TransformersTextRouter from_dict methods to work when init_parameters only contain required variables.
  • For components that support multiple Document Stores, prioritize using the specific from_dict class method for deserialization when available. Otherwise, fall back to the generic default_from_dict method. This impacts the following generic components: CacheChecker, DocumentWriter, FilterRetriever, and SentenceWindowRetriever.