@@ -166,7 +166,7 @@ To use Spark NLP you need the following requirements:
166
166
167
167
** GPU (optional):**
168
168
169
- Spark NLP 5.4.0-rc1 is built with ONNX 1.17.0 and TensorFlow 2.7.1 deep learning engines. The minimum following NVIDIA® software are only required for GPU support:
169
+ Spark NLP 5.4.0-rc2 is built with ONNX 1.17.0 and TensorFlow 2.7.1 deep learning engines. The minimum following NVIDIA® software are only required for GPU support:
170
170
171
171
- NVIDIA® GPU drivers version 450.80.02 or higher
172
172
- CUDA® Toolkit 11.2
@@ -182,7 +182,7 @@ $ java -version
182
182
$ conda create -n sparknlp python=3.7 -y
183
183
$ conda activate sparknlp
184
184
# spark-nlp by default is based on pyspark 3.x
185
- $ pip install spark-nlp==5.4.0-rc1 pyspark==3.3.1
185
+ $ pip install spark-nlp==5.4.0-rc2 pyspark==3.3.1
186
186
```
187
187
188
188
In Python console or Jupyter ` Python3 ` kernel:
@@ -227,7 +227,7 @@ For more examples, you can visit our dedicated [examples](https://github.com/Joh
227
227
228
228
## Apache Spark Support
229
229
230
- Spark NLP * 5.4.0-rc1 * has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x
230
+ Spark NLP * 5.4.0-rc2 * has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x
231
231
232
232
| Spark NLP | Apache Spark 3.5.x | Apache Spark 3.4.x | Apache Spark 3.3.x | Apache Spark 3.2.x | Apache Spark 3.1.x | Apache Spark 3.0.x | Apache Spark 2.4.x | Apache Spark 2.3.x |
233
233
| -----------| --------------------| --------------------| --------------------| --------------------| --------------------| --------------------| --------------------| --------------------|
@@ -271,7 +271,7 @@ Find out more about `Spark NLP` versions from our [release notes](https://github
271
271
272
272
## Databricks Support
273
273
274
- Spark NLP 5.4.0-rc1 has been tested and is compatible with the following runtimes:
274
+ Spark NLP 5.4.0-rc2 has been tested and is compatible with the following runtimes:
275
275
276
276
** CPU:**
277
277
@@ -344,7 +344,7 @@ Spark NLP 5.4.0-rc1 has been tested and is compatible with the following runtime
344
344
345
345
## EMR Support
346
346
347
- Spark NLP 5.4.0-rc1 has been tested and is compatible with the following EMR releases:
347
+ Spark NLP 5.4.0-rc2 has been tested and is compatible with the following EMR releases:
348
348
349
349
- emr-6.2.0
350
350
- emr-6.3.0
@@ -394,11 +394,11 @@ Spark NLP supports all major releases of Apache Spark 3.0.x, Apache Spark 3.1.x,
394
394
``` sh
395
395
# CPU
396
396
397
- spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc1
397
+ spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2
398
398
399
- pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc1
399
+ pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2
400
400
401
- spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc1
401
+ spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2
402
402
```
403
403
404
404
The ` spark-nlp ` has been published to
@@ -407,11 +407,11 @@ the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/s
407
407
``` sh
408
408
# GPU
409
409
410
- spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.4.0-rc1
410
+ spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.4.0-rc2
411
411
412
- pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.4.0-rc1
412
+ pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.4.0-rc2
413
413
414
- spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.4.0-rc1
414
+ spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.4.0-rc2
415
415
416
416
```
417
417
@@ -421,11 +421,11 @@ the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/s
421
421
``` sh
422
422
# AArch64
423
423
424
- spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.4.0-rc1
424
+ spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.4.0-rc2
425
425
426
- pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.4.0-rc1
426
+ pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.4.0-rc2
427
427
428
- spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.4.0-rc1
428
+ spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.4.0-rc2
429
429
430
430
```
431
431
@@ -435,11 +435,11 @@ the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/s
435
435
``` sh
436
436
# M1/M2 (Apple Silicon)
437
437
438
- spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.4.0-rc1
438
+ spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.4.0-rc2
439
439
440
- pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.4.0-rc1
440
+ pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.4.0-rc2
441
441
442
- spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.4.0-rc1
442
+ spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.4.0-rc2
443
443
444
444
```
445
445
@@ -453,7 +453,7 @@ set in your SparkSession:
453
453
spark-shell \
454
454
--driver-memory 16g \
455
455
--conf spark.kryoserializer.buffer.max=2000M \
456
- --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc1
456
+ --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2
457
457
```
458
458
459
459
## Scala
@@ -471,7 +471,7 @@ coordinates:
471
471
<dependency >
472
472
<groupId >com.johnsnowlabs.nlp</groupId >
473
473
<artifactId >spark-nlp_2.12</artifactId >
474
- <version >5.4.0-rc1 </version >
474
+ <version >5.4.0-rc2 </version >
475
475
</dependency >
476
476
```
477
477
@@ -482,7 +482,7 @@ coordinates:
482
482
<dependency >
483
483
<groupId >com.johnsnowlabs.nlp</groupId >
484
484
<artifactId >spark-nlp-gpu_2.12</artifactId >
485
- <version >5.4.0-rc1 </version >
485
+ <version >5.4.0-rc2 </version >
486
486
</dependency >
487
487
```
488
488
@@ -493,7 +493,7 @@ coordinates:
493
493
<dependency >
494
494
<groupId >com.johnsnowlabs.nlp</groupId >
495
495
<artifactId >spark-nlp-aarch64_2.12</artifactId >
496
- <version >5.4.0-rc1 </version >
496
+ <version >5.4.0-rc2 </version >
497
497
</dependency >
498
498
```
499
499
@@ -504,7 +504,7 @@ coordinates:
504
504
<dependency >
505
505
<groupId >com.johnsnowlabs.nlp</groupId >
506
506
<artifactId >spark-nlp-silicon_2.12</artifactId >
507
- <version >5.4.0-rc1 </version >
507
+ <version >5.4.0-rc2 </version >
508
508
</dependency >
509
509
```
510
510
@@ -514,28 +514,28 @@ coordinates:
514
514
515
515
``` sbtshell
516
516
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp
517
- libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "5.4.0-rc1 "
517
+ libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "5.4.0-rc2 "
518
518
```
519
519
520
520
** spark-nlp-gpu:**
521
521
522
522
``` sbtshell
523
523
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-gpu
524
- libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "5.4.0-rc1 "
524
+ libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "5.4.0-rc2 "
525
525
```
526
526
527
527
** spark-nlp-aarch64:**
528
528
529
529
``` sbtshell
530
530
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-aarch64
531
- libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-aarch64" % "5.4.0-rc1 "
531
+ libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-aarch64" % "5.4.0-rc2 "
532
532
```
533
533
534
534
** spark-nlp-silicon:**
535
535
536
536
``` sbtshell
537
537
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-silicon
538
- libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-silicon" % "5.4.0-rc1 "
538
+ libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-silicon" % "5.4.0-rc2 "
539
539
```
540
540
541
541
Maven
@@ -557,7 +557,7 @@ If you installed pyspark through pip/conda, you can install `spark-nlp` through
557
557
Pip:
558
558
559
559
``` bash
560
- pip install spark-nlp==5.4.0-rc1
560
+ pip install spark-nlp==5.4.0-rc2
561
561
```
562
562
563
563
Conda:
@@ -586,7 +586,7 @@ spark = SparkSession.builder
586
586
.config(" spark.driver.memory" , " 16G" )
587
587
.config(" spark.driver.maxResultSize" , " 0" )
588
588
.config(" spark.kryoserializer.buffer.max" , " 2000M" )
589
- .config(" spark.jars.packages" , " com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc1 " )
589
+ .config(" spark.jars.packages" , " com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2 " )
590
590
.getOrCreate()
591
591
```
592
592
@@ -657,7 +657,7 @@ Use either one of the following options
657
657
- Add the following Maven Coordinates to the interpreter's library list
658
658
659
659
``` bash
660
- com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc1
660
+ com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2
661
661
```
662
662
663
663
- Add a path to pre-built jar from [ here] ( #compiled-jars ) in the interpreter's library list making sure the jar is
@@ -668,7 +668,7 @@ com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc1
668
668
Apart from the previous step, install the python module through pip
669
669
670
670
``` bash
671
- pip install spark-nlp==5.4.0-rc1
671
+ pip install spark-nlp==5.4.0-rc2
672
672
```
673
673
674
674
Or you can install ` spark-nlp ` from inside Zeppelin by using Conda:
@@ -696,7 +696,7 @@ launch the Jupyter from the same Python environment:
696
696
$ conda create -n sparknlp python=3.8 -y
697
697
$ conda activate sparknlp
698
698
# spark-nlp by default is based on pyspark 3.x
699
- $ pip install spark-nlp==5.4.0-rc1 pyspark==3.3.1 jupyter
699
+ $ pip install spark-nlp==5.4.0-rc2 pyspark==3.3.1 jupyter
700
700
$ jupyter notebook
701
701
```
702
702
@@ -713,7 +713,7 @@ export PYSPARK_PYTHON=python3
713
713
export PYSPARK_DRIVER_PYTHON=jupyter
714
714
export PYSPARK_DRIVER_PYTHON_OPTS=notebook
715
715
716
- pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc1
716
+ pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2
717
717
```
718
718
719
719
Alternatively, you can mix in using ` --jars ` option for pyspark + ` pip install spark-nlp `
@@ -740,7 +740,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi
740
740
# -s is for spark-nlp
741
741
# -g will enable upgrading libcudnn8 to 8.1.0 on Google Colab for GPU usage
742
742
# by default they are set to the latest
743
- ! wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.4.0-rc1
743
+ ! wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.4.0-rc2
744
744
```
745
745
746
746
[ Spark NLP quick start on Google Colab] ( https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/quick_start_google_colab.ipynb )
@@ -763,7 +763,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi
763
763
# -s is for spark-nlp
764
764
# -g will enable upgrading libcudnn8 to 8.1.0 on Kaggle for GPU usage
765
765
# by default they are set to the latest
766
- ! wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.4.0-rc1
766
+ ! wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.4.0-rc2
767
767
```
768
768
769
769
[ Spark NLP quick start on Kaggle Kernel] ( https://www.kaggle.com/mozzie/spark-nlp-named-entity-recognition ) is a live
@@ -782,9 +782,9 @@ demo on Kaggle Kernel that performs named entity recognitions by using Spark NLP
782
782
783
783
3. In ` Libraries` tab inside your cluster you need to follow these steps:
784
784
785
- 3.1. Install New -> PyPI -> ` spark-nlp==5.4.0-rc1 ` -> Install
785
+ 3.1. Install New -> PyPI -> ` spark-nlp==5.4.0-rc2 ` -> Install
786
786
787
- 3.2. Install New -> Maven -> Coordinates -> ` com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc1 ` -> Install
787
+ 3.2. Install New -> Maven -> Coordinates -> ` com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2 ` -> Install
788
788
789
789
4. Now you can attach your notebook to the cluster and use Spark NLP!
790
790
@@ -835,7 +835,7 @@ A sample of your software configuration in JSON on S3 (must be public access):
835
835
"spark.kryoserializer.buffer.max": "2000M",
836
836
"spark.serializer": "org.apache.spark.serializer.KryoSerializer",
837
837
"spark.driver.maxResultSize": "0",
838
- "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc1 "
838
+ "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2 "
839
839
}
840
840
}]
841
841
```
@@ -844,7 +844,7 @@ A sample of AWS CLI to launch EMR cluster:
844
844
845
845
```.sh
846
846
aws emr create-cluster \
847
- --name "Spark NLP 5.4.0-rc1 " \
847
+ --name "Spark NLP 5.4.0-rc2 " \
848
848
--release-label emr-6.2.0 \
849
849
--applications Name=Hadoop Name=Spark Name=Hive \
850
850
--instance-type m4.4xlarge \
@@ -908,7 +908,7 @@ gcloud dataproc clusters create ${CLUSTER_NAME} \
908
908
--enable-component-gateway \
909
909
--metadata ' PIP_PACKAGES=spark-nlp spark-nlp-display google-cloud-bigquery google-cloud-storage' \
910
910
--initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/python/pip-install.sh \
911
- --properties spark:spark.serializer=org.apache.spark.serializer.KryoSerializer,spark:spark.driver.maxResultSize=0,spark:spark.kryoserializer.buffer.max=2000M,spark:spark.jars.packages=com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc1
911
+ --properties spark:spark.serializer=org.apache.spark.serializer.KryoSerializer,spark:spark.driver.maxResultSize=0,spark:spark.kryoserializer.buffer.max=2000M,spark:spark.jars.packages=com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2
912
912
```
913
913
914
914
2. On an existing one, you need to install spark-nlp and spark-nlp-display packages from PyPI.
@@ -951,7 +951,7 @@ spark = SparkSession.builder
951
951
.config("spark.kryoserializer.buffer.max", "2000m")
952
952
.config("spark.jsl.settings.pretrained.cache_folder", "sample_data/pretrained")
953
953
.config("spark.jsl.settings.storage.cluster_tmp_dir", "sample_data/storage")
954
- .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc1 ")
954
+ .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2 ")
955
955
.getOrCreate()
956
956
```
957
957
@@ -965,7 +965,7 @@ spark-shell \
965
965
--conf spark.kryoserializer.buffer.max=2000M \
966
966
--conf spark.jsl.settings.pretrained.cache_folder="sample_data/pretrained" \
967
967
--conf spark.jsl.settings.storage.cluster_tmp_dir="sample_data/storage" \
968
- --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc1
968
+ --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2
969
969
```
970
970
971
971
**pyspark:**
@@ -978,7 +978,7 @@ pyspark \
978
978
--conf spark.kryoserializer.buffer.max=2000M \
979
979
--conf spark.jsl.settings.pretrained.cache_folder="sample_data/pretrained" \
980
980
--conf spark.jsl.settings.storage.cluster_tmp_dir="sample_data/storage" \
981
- --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc1
981
+ --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2
982
982
```
983
983
984
984
**Databricks:**
@@ -1250,7 +1250,7 @@ spark = SparkSession.builder
1250
1250
.config("spark.driver.memory", "16G")
1251
1251
.config("spark.driver.maxResultSize", "0")
1252
1252
.config("spark.kryoserializer.buffer.max", "2000M")
1253
- .config("spark.jars", "/tmp/spark-nlp-assembly-5.4.0-rc1 .jar")
1253
+ .config("spark.jars", "/tmp/spark-nlp-assembly-5.4.0-rc2 .jar")
1254
1254
.getOrCreate()
1255
1255
```
1256
1256
@@ -1259,7 +1259,7 @@ spark = SparkSession.builder
1259
1259
version (3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x)
1260
1260
- If you are local, you can load the Fat JAR from your local FileSystem, however, if you are in a cluster setup you need
1261
1261
to put the Fat JAR on a distributed FileSystem such as HDFS, DBFS, S3, etc. (
1262
- i.e., `hdfs:///tmp/spark-nlp-assembly-5.4.0-rc1 .jar`)
1262
+ i.e., `hdfs:///tmp/spark-nlp-assembly-5.4.0-rc2 .jar`)
1263
1263
1264
1264
Example of using pretrained Models and Pipelines in offline:
1265
1265
0 commit comments