Skip to content

Spark NLP 6.1.0 Release #14634

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Jul 23, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
7 changes: 6 additions & 1 deletion .github/workflows/create_search_index.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ concurrency:

jobs:
jekyll:
runs-on: ubuntu-latest
runs-on: RAM32GB
timeout-minutes: 600
environment: jekyll
steps:
- uses: actions/checkout@v2
Expand Down Expand Up @@ -49,6 +50,8 @@ jobs:
ELASTICSEARCH_INDEX_NAME: ${{ secrets.ELASTICSEARCH_INDEX_NAME }}
SEARCH_ORIGIN: ${{ secrets.SEARCH_ORIGIN }}
ORIGIN: ${{ secrets.ORIGIN }}
AWS_ACCESS_KEY_ID: ${{ secrets.MODELS_PUBLIC_KEY }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.MODELS_SECRET_KEY }}
working-directory: docs
run: |
bundle exec jekyll build --incremental
Expand All @@ -62,6 +65,8 @@ jobs:
ELASTICSEARCH_INDEX_NAME: ${{ secrets.ELASTICSEARCH_INDEX_NAME }}
SEARCH_ORIGIN: ${{ secrets.SEARCH_ORIGIN }}
ORIGIN: ${{ secrets.ORIGIN }}
AWS_ACCESS_KEY_ID: ${{ secrets.MODELS_PUBLIC_KEY }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.MODELS_SECRET_KEY }}
working-directory: docs
run: |
rm -f .jekyll-metadata
Expand Down
16 changes: 16 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,3 +1,19 @@
=======
6.1.0
=======
---------------------------
New Features & Enhancements
---------------------------

* [SPARKNLP-1189] Introducing Phi4
* [SPARKNLP-1259] Introducing Reader2Doc Annotator
* [SPARKNLP-1194] Upgrade jsl-llamacpp to newest version

---------
Bug Fixes
---------
* Fix HuggingFace_OpenVINO_in_Spark_NLP_Qwen2VL.ipynb

=======
6.0.5
=======
Expand Down
19 changes: 10 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ $ java -version
$ conda create -n sparknlp python=3.7 -y
$ conda activate sparknlp
# spark-nlp by default is based on pyspark 3.x
$ pip install spark-nlp==6.0.5 pyspark==3.3.1
$ pip install spark-nlp==6.1.0 pyspark==3.3.1
```

In Python console or Jupyter `Python3` kernel:
Expand Down Expand Up @@ -129,11 +129,11 @@ For a quick example of using pipelines and models take a look at our official [d

### Apache Spark Support

Spark NLP *6.0.5* has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x
Spark NLP *6.1.0* has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x

| Spark NLP | Apache Spark 3.5.x | Apache Spark 3.4.x | Apache Spark 3.3.x | Apache Spark 3.2.x | Apache Spark 3.1.x | Apache Spark 3.0.x | Apache Spark 2.4.x | Apache Spark 2.3.x |
|-----------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|
| 6.0.x | YES | YES | YES | YES | YES | YES | NO | NO |
| 6.x.x and up | YES | YES | YES | YES | YES | YES | NO | NO |
| 5.5.x | YES | YES | YES | YES | YES | YES | NO | NO |
| 5.4.x | YES | YES | YES | YES | YES | YES | NO | NO |
| 5.3.x | YES | YES | YES | YES | YES | YES | NO | NO |
Expand All @@ -159,24 +159,25 @@ Find out more about 4.x `SparkNLP` versions in our official [documentation](http

### Databricks Support

Spark NLP 6.0.5 has been tested and is compatible with the following runtimes:
Spark NLP 6.1.0 has been tested and is compatible with the following runtimes:

| **CPU** | **GPU** |
|--------------------|--------------------|
| 14.1 / 14.1 ML | 14.1 ML & GPU |
| 14.2 / 14.2 ML | 14.2 ML & GPU |
| 14.3 / 14.3 ML | 14.3 ML & GPU |
| 15.0 / 15.0 ML | 15.0 ML & GPU |
| 15.1 / 15.0 ML | 15.1 ML & GPU |
| 15.2 / 15.0 ML | 15.2 ML & GPU |
| 15.3 / 15.0 ML | 15.3 ML & GPU |
| 15.4 / 15.0 ML | 15.4 ML & GPU |
| 15.1 / 15.1 ML | 15.1 ML & GPU |
| 15.2 / 15.2 ML | 15.2 ML & GPU |
| 15.3 / 15.3 ML | 15.3 ML & GPU |
| 15.4 / 15.4 ML | 15.4 ML & GPU |
| 16.4 / 16.4 ML | 16.4 ML & GPU |

We are compatible with older runtimes. For a full list check databricks support in our official [documentation](https://sparknlp.org/docs/en/install#databricks-support)

### EMR Support

Spark NLP 6.0.5 has been tested and is compatible with the following EMR releases:
Spark NLP 6.1.0 has been tested and is compatible with the following EMR releases:

| **EMR Release** |
|--------------------|
Expand Down
2 changes: 1 addition & 1 deletion build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ name := getPackageName(is_silicon, is_gpu, is_aarch64)

organization := "com.johnsnowlabs.nlp"

version := "6.0.5"
version := "6.1.0"

(ThisBuild / scalaVersion) := scalaVer

Expand Down
4 changes: 2 additions & 2 deletions conda/meta.yaml
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
{% set name = "spark-nlp" %}
{% set version = "6.0.5" %}
{% set version = "6.1.0" %}

package:
name: {{ name|lower }}
version: {{ version }}

source:
url: https://pypi.io/packages/source/{{ name[0] }}/{{ name }}/spark_nlp-{{ version }}.tar.gz
sha256: 0610b5b78db44b934764e1a4bdfe2fd425e6e6bc03104aeefc83f0e5c9e2808e
sha256: 1356e0839868a6c4b5f6befad7e937e43864b57e4a0feb168b06395906136a27

build:
noarch: python
Expand Down
3 changes: 3 additions & 0 deletions docs/Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@ gem "webrick"

gem "jekyll", "~> 3.9"

gem "aws-sdk-s3", "~>1"


group "jekyll-plugins" do
gem "jekyll-incremental", "0.1.0", path: "_plugins/jekyll-incremental"
end
22 changes: 22 additions & 0 deletions docs/Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,25 @@ GEM
zeitwerk (~> 2.2, >= 2.2.2)
addressable (2.8.1)
public_suffix (>= 2.0.2, < 6.0)
aws-eventstream (1.4.0)
aws-partitions (1.1126.0)
aws-sdk-core (3.226.2)
aws-eventstream (~> 1, >= 1.3.0)
aws-partitions (~> 1, >= 1.992.0)
aws-sigv4 (~> 1.9)
base64
jmespath (~> 1, >= 1.6.1)
logger
aws-sdk-kms (1.106.0)
aws-sdk-core (~> 3, >= 3.225.0)
aws-sigv4 (~> 1.5)
aws-sdk-s3 (1.192.0)
aws-sdk-core (~> 3, >= 3.225.0)
aws-sdk-kms (~> 1)
aws-sigv4 (~> 1.5)
aws-sigv4 (1.12.1)
aws-eventstream (~> 1, >= 1.0.2)
base64 (0.3.0)
coffee-script (2.4.1)
coffee-script-source
execjs
Expand Down Expand Up @@ -233,6 +252,7 @@ GEM
gemoji (~> 3.0)
html-pipeline (~> 2.2)
jekyll (>= 3.0, < 5.0)
jmespath (1.6.2)
kramdown (2.3.2)
rexml
kramdown-parser-gfm (1.1.0)
Expand All @@ -241,6 +261,7 @@ GEM
listen (3.8.0)
rb-fsevent (~> 0.10, >= 0.10.3)
rb-inotify (~> 0.9, >= 0.9.10)
logger (1.7.0)
mercenary (0.3.6)
mini_portile2 (2.8.1)
minima (2.5.1)
Expand Down Expand Up @@ -301,6 +322,7 @@ PLATFORMS
x86_64-linux

DEPENDENCIES
aws-sdk-s3 (~> 1)
elasticsearch (~> 7.10)
github-pages (= 227)
jekyll (~> 3.9)
Expand Down
23 changes: 21 additions & 2 deletions docs/_plugins/search_index.rb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@
require 'date'
require 'elasticsearch'
require 'nokogiri'
require 'aws-sdk-s3'

BUCKET_NAME="pypi.johnsnowlabs.com"
SEARCH_URL = (ENV["SEARCH_ORIGIN"] || 'https://search.modelshub.johnsnowlabs.com') + '/'
ELASTICSEARCH_INDEX_NAME = ENV["ELASTICSEARCH_INDEX_NAME"] || 'models'

Expand All @@ -17,6 +19,18 @@

$remote_editions = Set.new

def upload_file_to_s3_bucket(file_path)
s3 = Aws::S3::Client.new(region: 'eu-west-1')
object_key = "public/models.json"
begin
s3.put_object(bucket: BUCKET_NAME, key: object_key, body: File.open(file_path, 'rb'), acl: 'public-read')
puts "File uploaded successfully to #{BUCKET_NAME}/#{object_key}"

rescue Aws::S3::Errors::ServiceError => e
puts "Failed to upload file: #{e.message}"
end
end

class Version < Array
def initialize name
m = /(\d+\.\d+)\z/.match(name)
Expand Down Expand Up @@ -252,7 +266,7 @@ def initialize(client)

def index(id, data)
@buffer << { update: { _id: id, data: {doc: data, doc_as_upsert: true}} }
self.execute if @buffer.length >= 100
self.execute if @buffer.length >= 500
end

def execute
Expand Down Expand Up @@ -578,9 +592,14 @@ def is_latest?(group, model)
models_references_json = backup_references_data.merge(models_references_json)
end

filename = File.join(site.config['destination'], 'models.json')
filename = File.join(site.config['destination'], 'backup-modelss3.json')

File.write(filename, models_json.values.to_json)
File.write(backup_filename, models_json.to_json)
# models.json moved to pypi s3 bucket
upload_file_to_s3_bucket(filename)

File.delete(filename)

benchmarking_filename = File.join(site.config['destination'], 'benchmarking.json')
File.write(benchmarking_filename, models_benchmarking_json.to_json)
Expand Down
8 changes: 4 additions & 4 deletions docs/api/com/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
<head>
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" />
<title>Spark NLP 6.0.5 ScalaDoc - com</title>
<meta name="description" content="Spark NLP 6.0.5 ScalaDoc - com" />
<meta name="keywords" content="Spark NLP 6.0.5 ScalaDoc com" />
<title>Spark NLP 6.1.0-rc1 ScalaDoc - com</title>
<meta name="description" content="Spark NLP 6.1.0 - rc1 ScalaDoc - com" />
<meta name="keywords" content="Spark NLP 6.1.0 rc1 ScalaDoc com" />
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />


Expand All @@ -28,7 +28,7 @@
</head>
<body>
<div id="search">
<span id="doc-title">Spark NLP 6.0.5 ScalaDoc<span id="doc-version"></span></span>
<span id="doc-title">Spark NLP 6.1.0-rc1 ScalaDoc<span id="doc-version"></span></span>
<span class="close-results"><span class="left">&lt;</span> Back</span>
<div id="textfilter">
<span class="input">
Expand Down
8 changes: 4 additions & 4 deletions docs/api/com/johnsnowlabs/client/CloudClient.html
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
<head>
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" />
<title>Spark NLP 6.0.5 ScalaDoc - com.johnsnowlabs.client.CloudClient</title>
<meta name="description" content="Spark NLP 6.0.5 ScalaDoc - com.johnsnowlabs.client.CloudClient" />
<meta name="keywords" content="Spark NLP 6.0.5 ScalaDoc com.johnsnowlabs.client.CloudClient" />
<title>Spark NLP 6.1.0-rc1 ScalaDoc - com.johnsnowlabs.client.CloudClient</title>
<meta name="description" content="Spark NLP 6.1.0 - rc1 ScalaDoc - com.johnsnowlabs.client.CloudClient" />
<meta name="keywords" content="Spark NLP 6.1.0 rc1 ScalaDoc com.johnsnowlabs.client.CloudClient" />
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />


Expand All @@ -28,7 +28,7 @@
</head>
<body>
<div id="search">
<span id="doc-title">Spark NLP 6.0.5 ScalaDoc<span id="doc-version"></span></span>
<span id="doc-title">Spark NLP 6.1.0-rc1 ScalaDoc<span id="doc-version"></span></span>
<span class="close-results"><span class="left">&lt;</span> Back</span>
<div id="textfilter">
<span class="input">
Expand Down
8 changes: 4 additions & 4 deletions docs/api/com/johnsnowlabs/client/CloudManager.html
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
<head>
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" />
<title>Spark NLP 6.0.5 ScalaDoc - com.johnsnowlabs.client.CloudManager</title>
<meta name="description" content="Spark NLP 6.0.5 ScalaDoc - com.johnsnowlabs.client.CloudManager" />
<meta name="keywords" content="Spark NLP 6.0.5 ScalaDoc com.johnsnowlabs.client.CloudManager" />
<title>Spark NLP 6.1.0-rc1 ScalaDoc - com.johnsnowlabs.client.CloudManager</title>
<meta name="description" content="Spark NLP 6.1.0 - rc1 ScalaDoc - com.johnsnowlabs.client.CloudManager" />
<meta name="keywords" content="Spark NLP 6.1.0 rc1 ScalaDoc com.johnsnowlabs.client.CloudManager" />
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />


Expand All @@ -28,7 +28,7 @@
</head>
<body>
<div id="search">
<span id="doc-title">Spark NLP 6.0.5 ScalaDoc<span id="doc-version"></span></span>
<span id="doc-title">Spark NLP 6.1.0-rc1 ScalaDoc<span id="doc-version"></span></span>
<span class="close-results"><span class="left">&lt;</span> Back</span>
<div id="textfilter">
<span class="input">
Expand Down
8 changes: 4 additions & 4 deletions docs/api/com/johnsnowlabs/client/CloudResources$.html
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
<head>
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" />
<title>Spark NLP 6.0.5 ScalaDoc - com.johnsnowlabs.client.CloudResources</title>
<meta name="description" content="Spark NLP 6.0.5 ScalaDoc - com.johnsnowlabs.client.CloudResources" />
<meta name="keywords" content="Spark NLP 6.0.5 ScalaDoc com.johnsnowlabs.client.CloudResources" />
<title>Spark NLP 6.1.0-rc1 ScalaDoc - com.johnsnowlabs.client.CloudResources</title>
<meta name="description" content="Spark NLP 6.1.0 - rc1 ScalaDoc - com.johnsnowlabs.client.CloudResources" />
<meta name="keywords" content="Spark NLP 6.1.0 rc1 ScalaDoc com.johnsnowlabs.client.CloudResources" />
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />


Expand All @@ -28,7 +28,7 @@
</head>
<body>
<div id="search">
<span id="doc-title">Spark NLP 6.0.5 ScalaDoc<span id="doc-version"></span></span>
<span id="doc-title">Spark NLP 6.1.0-rc1 ScalaDoc<span id="doc-version"></span></span>
<span class="close-results"><span class="left">&lt;</span> Back</span>
<div id="textfilter">
<span class="input">
Expand Down
8 changes: 4 additions & 4 deletions docs/api/com/johnsnowlabs/client/CloudStorage.html
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
<head>
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" />
<title>Spark NLP 6.0.5 ScalaDoc - com.johnsnowlabs.client.CloudStorage</title>
<meta name="description" content="Spark NLP 6.0.5 ScalaDoc - com.johnsnowlabs.client.CloudStorage" />
<meta name="keywords" content="Spark NLP 6.0.5 ScalaDoc com.johnsnowlabs.client.CloudStorage" />
<title>Spark NLP 6.1.0-rc1 ScalaDoc - com.johnsnowlabs.client.CloudStorage</title>
<meta name="description" content="Spark NLP 6.1.0 - rc1 ScalaDoc - com.johnsnowlabs.client.CloudStorage" />
<meta name="keywords" content="Spark NLP 6.1.0 rc1 ScalaDoc com.johnsnowlabs.client.CloudStorage" />
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />


Expand All @@ -28,7 +28,7 @@
</head>
<body>
<div id="search">
<span id="doc-title">Spark NLP 6.0.5 ScalaDoc<span id="doc-version"></span></span>
<span id="doc-title">Spark NLP 6.1.0-rc1 ScalaDoc<span id="doc-version"></span></span>
<span class="close-results"><span class="left">&lt;</span> Back</span>
<div id="textfilter">
<span class="input">
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
<head>
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" />
<title>Spark NLP 6.0.5 ScalaDoc - com.johnsnowlabs.client.aws.AWSAnonymousCredentials</title>
<meta name="description" content="Spark NLP 6.0.5 ScalaDoc - com.johnsnowlabs.client.aws.AWSAnonymousCredentials" />
<meta name="keywords" content="Spark NLP 6.0.5 ScalaDoc com.johnsnowlabs.client.aws.AWSAnonymousCredentials" />
<title>Spark NLP 6.1.0-rc1 ScalaDoc - com.johnsnowlabs.client.aws.AWSAnonymousCredentials</title>
<meta name="description" content="Spark NLP 6.1.0 - rc1 ScalaDoc - com.johnsnowlabs.client.aws.AWSAnonymousCredentials" />
<meta name="keywords" content="Spark NLP 6.1.0 rc1 ScalaDoc com.johnsnowlabs.client.aws.AWSAnonymousCredentials" />
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />


Expand All @@ -28,7 +28,7 @@
</head>
<body>
<div id="search">
<span id="doc-title">Spark NLP 6.0.5 ScalaDoc<span id="doc-version"></span></span>
<span id="doc-title">Spark NLP 6.1.0-rc1 ScalaDoc<span id="doc-version"></span></span>
<span class="close-results"><span class="left">&lt;</span> Back</span>
<div id="textfilter">
<span class="input">
Expand Down
8 changes: 4 additions & 4 deletions docs/api/com/johnsnowlabs/client/aws/AWSBasicCredentials.html
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
<head>
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" />
<title>Spark NLP 6.0.5 ScalaDoc - com.johnsnowlabs.client.aws.AWSBasicCredentials</title>
<meta name="description" content="Spark NLP 6.0.5 ScalaDoc - com.johnsnowlabs.client.aws.AWSBasicCredentials" />
<meta name="keywords" content="Spark NLP 6.0.5 ScalaDoc com.johnsnowlabs.client.aws.AWSBasicCredentials" />
<title>Spark NLP 6.1.0-rc1 ScalaDoc - com.johnsnowlabs.client.aws.AWSBasicCredentials</title>
<meta name="description" content="Spark NLP 6.1.0 - rc1 ScalaDoc - com.johnsnowlabs.client.aws.AWSBasicCredentials" />
<meta name="keywords" content="Spark NLP 6.1.0 rc1 ScalaDoc com.johnsnowlabs.client.aws.AWSBasicCredentials" />
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />


Expand All @@ -28,7 +28,7 @@
</head>
<body>
<div id="search">
<span id="doc-title">Spark NLP 6.0.5 ScalaDoc<span id="doc-version"></span></span>
<span id="doc-title">Spark NLP 6.1.0-rc1 ScalaDoc<span id="doc-version"></span></span>
<span class="close-results"><span class="left">&lt;</span> Back</span>
<div id="textfilter">
<span class="input">
Expand Down
Loading