Skip to content

Spark NLP 6.1.0 Release #14634

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Jul 23, 2025
Merged

Spark NLP 6.1.0 Release #14634

merged 18 commits into from
Jul 23, 2025

Conversation

DevinTDHa
Copy link
Member

@DevinTDHa DevinTDHa commented Jul 23, 2025

Description

Merged PRs:

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • Code improvements with no or little impact
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING page.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

AbdullahMubeenAnwar and others added 17 commits July 23, 2025 11:50
* Add Phi-4 model implementation and tokenizer support

- Introduced `Phi4` class for the state-of-the-art Phi-4 model by Microsoft Research, including methods for encoding, decoding, and tagging.
- Added `Phi4Transformer` for integration with Spark NLP, enabling pretrained model loading and configuration.
- Implemented `Phi4Tokenizer` for byte pair encoding specific to the Phi-4 model.
- Updated `BpeTokenizer` to support the new Phi-4 tokenizer.
- Added tests for `Phi4Transformer` to ensure functionality and performance.
- Updated resource downloader to include `Phi4Transformer` for pretrained model access.

* Enhance documentation for Phi-4 model in `Phi4Transformer.scala`

- Expanded the model description to include detailed information on parameters, intended use, benchmarks, safety, limitations, and usage instructions.
- Improved formatting for better readability and clarity.
- Added references for further information on the Phi-4 model.

* Add Phi-4 Transformer implementation and integration

- Introduced `Phi4Transformer` class for the state-of-the-art Phi-4 model by Microsoft Research, enabling advanced reasoning and NLP tasks.
- Added support for loading pretrained models and configuration parameters.
- Implemented a loader for the Phi-4 model in the internal module.
- Created unit tests to validate the functionality of the `Phi4Transformer`.
- Updated the main `__init__.py` to include the new transformer in the module exports.

* Add documentation and example notebook for Phi-4 Transformer

- Created a detailed markdown file for the `Phi4Transformer`, outlining its features, usage, and pretrained model loading instructions.
- Added a Jupyter notebook demonstrating the integration of the Phi-4 model with Spark NLP and Intel OpenVINO, including installation steps and example code.
- Enhanced the Scala implementation comments for better clarity and formatting.
- Updated the test specifications to reflect changes in the handling of temperature settings during predictions.
* Upgrade jsl-llama.cpp

- Embeddings Passing
- Adjust metadata extraction
- Fix changed parameters
- Add default system prompt
- Default params for AutoGGUFEmbeddings

* jsl-llama.cpp upgrade python side
@DevinTDHa DevinTDHa force-pushed the release/610-release-candidate branch from 6ecee5e to b5d2e50 Compare July 23, 2025 14:19
@DevinTDHa DevinTDHa merged commit b5b4381 into master Jul 23, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants