Skip to content

Commit 12e0d9f

Browse files
authored
Allow Xlsx Reader to Specify ParseHuge Release222 (#4517)
* Allow Xlsx Reader to Specify ParseHuge Release222 Fix #4260. A number of Security Advisories related to libxml_options were opened. In the end, we disabled the ability to specify any libxml_options. However, some users were adversely affected because they needed LIBXML_PARSEHUGE for some of their files. Having finally obtained access to a file demonstrating this problem, we can restore this ability. - The operation is potentially dangerous, a vector for memory leaks and out-of-memory errors. It is not recommended unless absolutely needed. - It will not be permitted as a global (static) property with the ability to adversely affect other users on the same server. - It will instead be implemented as an instance property of Xlsx Reader (default to false), with a setter. I do not see a use case for a getter. - People will need to set this property individually for each file which they think needs it. - This change will be backported to all supported releases. - The sheer size and processing time for the file involved makes it impractical to add a formal test case. It has, nevertheless, been tested satisfactorily. * Unneeded Blank Line * Update CHANGELOG.md
1 parent 448a343 commit 12e0d9f

File tree

2 files changed

+25
-7
lines changed

2 files changed

+25
-7
lines changed

CHANGELOG.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com)
66
and this project adheres to [Semantic Versioning](https://semver.org).
77

8-
# TBD - 2.3.9
8+
# 2025-06-22 - 2.3.9
99

1010
### Changed
1111

@@ -19,6 +19,7 @@ and this project adheres to [Semantic Versioning](https://semver.org).
1919

2020
- TEXT and TIMEVALUE functions. [Issue #4249](https://github.com/PHPOffice/PhpSpreadsheet/issues/4249) [PR #4354](https://github.com/PHPOffice/PhpSpreadsheet/pull/4354)
2121
- Removing Columns/Rows Containing Merged Cells. Backport of [PR #4465](https://github.com/PHPOffice/PhpSpreadsheet/pull/4465)
22+
- Allow Xlsx Reader to Specify ParseHuge. [Issue #4260](https://github.com/PHPOffice/PhpSpreadsheet/issues/4260) [PR #4517](https://github.com/PHPOffice/PhpSpreadsheet/pull/4517)
2223

2324
# 2025-02-07 - 2.3.8
2425

src/PhpSpreadsheet/Reader/Xlsx.php

Lines changed: 23 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,19 @@ class Xlsx extends BaseReader
5858

5959
private array $sharedFormulae = [];
6060

61+
private bool $parseHuge = false;
62+
63+
/**
64+
* Allow use of LIBXML_PARSEHUGE.
65+
* This option can lead to memory leaks and failures,
66+
* and is not recommended. But some very large spreadsheets
67+
* seem to require it.
68+
*/
69+
public function setParseHuge(bool $parseHuge): void
70+
{
71+
$this->parseHuge = $parseHuge;
72+
}
73+
6174
/**
6275
* Create a new Xlsx Reader instance.
6376
*/
@@ -119,8 +132,8 @@ private function loadZip(string $filename, string $ns = '', bool $replaceUnclose
119132
}
120133
$rels = @simplexml_load_string(
121134
$this->getSecurityScannerOrThrow()->scan($contents),
122-
'SimpleXMLElement',
123-
0,
135+
SimpleXMLElement::class,
136+
$this->parseHuge ? LIBXML_PARSEHUGE : 0,
124137
$ns
125138
);
126139

@@ -134,8 +147,8 @@ private function loadZipNonamespace(string $filename, string $ns): SimpleXMLElem
134147
$contents = $this->getFromZipArchive($this->zip, $filename);
135148
$rels = simplexml_load_string(
136149
$this->getSecurityScannerOrThrow()->scan($contents),
137-
'SimpleXMLElement',
138-
0,
150+
SimpleXMLElement::class,
151+
$this->parseHuge ? LIBXML_PARSEHUGE : 0,
139152
($ns === '' ? $ns : '')
140153
);
141154

@@ -249,7 +262,9 @@ public function listWorksheetInfo(string $filename): array
249262
$this->zip,
250263
$fileWorksheetPath
251264
)
252-
)
265+
),
266+
null,
267+
$this->parseHuge ? LIBXML_PARSEHUGE : 0
253268
);
254269
$xml->setParserProperty(2, true);
255270

@@ -1977,7 +1992,9 @@ private function readRibbon(Spreadsheet $excel, string $customUITarget, ZipArchi
19771992
// exists and not empty if the ribbon have some pictures (other than internal MSO)
19781993
$UIRels = simplexml_load_string(
19791994
$this->getSecurityScannerOrThrow()
1980-
->scan($dataRels)
1995+
->scan($dataRels),
1996+
SimpleXMLElement::class,
1997+
$this->parseHuge ? LIBXML_PARSEHUGE : 0
19811998
);
19821999
if (false !== $UIRels) {
19832000
// we need to save id and target to avoid parsing customUI.xml and "guess" if it's a pseudo callback who load the image

0 commit comments

Comments
 (0)