Skip to content

Commit e4bb64e

Browse files
feat: Configurable null character sanitization (#434)
Null characters are currently passed as-is to Postgres despite being unsupported. If it is encountered, it causes the sink to fail as noted here: #60 with an error like `ValueError: A string literal cannot contain NUL (0x00) characters.` This PR introduces a new option called `sanitize_null_text_characters` which enables sanitization of these characters. --------- Co-authored-by: Edgar Ramírez Mondragón <16805946+edgarrmondragon@users.noreply.github.com> Co-authored-by: Edgar Ramírez-Mondragón <edgarrm358@gmail.com>
1 parent fca6aa3 commit e4bb64e

File tree

4 files changed

+81
-29
lines changed

4 files changed

+81
-29
lines changed

README.md

Lines changed: 28 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -24,33 +24,34 @@ This target is tested with all actively supported [Python](https://devguide.pyth
2424

2525
## Settings
2626

27-
| Setting | Required | Default | Description |
28-
| :------------------------------ | :------- | :--------------------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
29-
| host | False | None | Hostname for postgres instance. |
30-
| port | False | 5432 | The port on which postgres is awaiting connections. |
31-
| user | False | None | User name used to authenticate. |
32-
| password | False | None | Password used to authenticate. |
33-
| database | False | None | Database name. |
34-
| use_copy | False | None | Use the COPY command to insert data. This is usually faster than INSERT statements. This option is only available for the postgres+psycopg dialect+driver combination. |
35-
| default_target_schema | False | melty | Postgres schema to send data to, example: tap-clickup |
36-
| activate_version | False | 1 | If set to false, the tap will ignore activate version messages. If set to true, add_record_metadata must be set to true as well. |
37-
| hard_delete | False | 0 | When activate version is sent from a tap this specefies if we should delete the records that don't match, or mark them with a date in the `_sdc_deleted_at` column. This config option is ignored if `activate_version` is set to false. |
38-
| add_record_metadata | False | 1 | Note that this must be enabled for activate_version to work!This adds _sdc_extracted_at, _sdc_batched_at, and more to every table. See https://sdk.meltano.com/en/latest/implementation/record_metadata.html for more information. |
39-
| interpret_content_encoding | False | 0 | If set to true, the target will interpret the content encoding of the schema to determine how to store the data. Using this option may result in a more efficient storage of the data but may also result in an error if the data is not encoded as expected. |
40-
| ssl_enable | False | 0 | Whether or not to use ssl to verify the server's identity. Use ssl_certificate_authority and ssl_mode for further customization. To use a client certificate to authenticate yourself to the server, use ssl_client_certificate_enable instead. |
41-
| ssl_client_certificate_enable | False | 0 | Whether or not to provide client-side certificates as a method of authentication to the server. Use ssl_client_certificate and ssl_client_private_key for further customization. To use SSL to verify the server's identity, use ssl_enable instead. |
42-
| ssl_mode | False | verify-full | SSL Protection method, see [postgres documentation](https://www.postgresql.org/docs/current/libpq-ssl.html#LIBPQ-SSL-PROTECTION) for more information. Must be one of disable, allow, prefer, require, verify-ca, or verify-full. |
43-
| ssl_certificate_authority | False | ~/.postgresql/root.crl | The certificate authority that should be used to verify the server's identity. Can be provided either as the certificate itself (in .env) or as a filepath to the certificate. |
44-
| ssl_client_certificate | False | ~/.postgresql/postgresql.crt | The certificate that should be used to verify your identity to the server. Can be provided either as the certificate itself (in .env) or as a filepath to the certificate. |
45-
| ssl_client_private_key | False | ~/.postgresql/postgresql.key | The private key for the certificate you provided. Can be provided either as the certificate itself (in .env) or as a filepath to the certificate. |
46-
| ssl_storage_directory | False | .secrets | The folder in which to store SSL certificates provided as raw values. When a certificate/key is provided as a raw value instead of as a filepath, it must be written to a file before it can be used. This configuration option determines where that file is created. |
47-
| ssh_tunnel | False | None | SSH Tunnel Configuration, this is a json object |
48-
| ssh_tunnel.enable | False | 0 | Enable an ssh tunnel (also known as bastion host), see the other ssh_tunnel.* properties for more details |
49-
| ssh_tunnel.host | False | None | Host of the bastion host, this is the host we'll connect to via ssh |
50-
| ssh_tunnel.username | False | None | Username to connect to bastion host |
51-
| ssh_tunnel.port | False | 22 | Port to connect to bastion host |
52-
| ssh_tunnel.private_key | False | None | Private Key for authentication to the bastion host |
53-
| ssh_tunnel.private_key_password | False | None | Private Key Password, leave None if no password is set |
27+
| Setting | Required | Default | Description |
28+
| :------------------------------ | :------- | :--------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
29+
| host | False | None | Hostname for postgres instance. |
30+
| port | False | 5432 | The port on which postgres is awaiting connections. |
31+
| user | False | None | User name used to authenticate. |
32+
| password | False | None | Password used to authenticate. |
33+
| database | False | None | Database name. |
34+
| use_copy | False | None | Use the COPY command to insert data. This is usually faster than INSERT statements. This option is only available for the postgres+psycopg dialect+driver combination. |
35+
| default_target_schema | False | melty | Postgres schema to send data to, example: tap-clickup |
36+
| activate_version | False | 1 | If set to false, the tap will ignore activate version messages. If set to true, add_record_metadata must be set to true as well. |
37+
| hard_delete | False | 0 | When activate version is sent from a tap this specefies if we should delete the records that don't match, or mark them with a date in the `_sdc_deleted_at` column. This config option is ignored if `activate_version` is set to false. |
38+
| add_record_metadata | False | 1 | Note that this must be enabled for activate_version to work!This adds _sdc_extracted_at, _sdc_batched_at, and more to every table. See https://sdk.meltano.com/en/latest/implementation/record_metadata.html for more information. |
39+
| interpret_content_encoding | False | 0 | If set to true, the target will interpret the content encoding of the schema to determine how to store the data. Using this option may result in a more efficient storage of the data but may also result in an error if the data is not encoded as expected. |
40+
| sanitize_null_text_characters | False | 0 | If set to true, the target will sanitize null characters in char/text/varchar fields, as they are not supported by Postgres. See [postgres documentation](https://www.postgresql.org/docs/current/functions-string.html) for more information about chr(0) not being supported. |
41+
| ssl_enable | False | 0 | Whether or not to use ssl to verify the server's identity. Use ssl_certificate_authority and ssl_mode for further customization. To use a client certificate to authenticate yourself to the server, use ssl_client_certificate_enable instead. |
42+
| ssl_client_certificate_enable | False | 0 | Whether or not to provide client-side certificates as a method of authentication to the server. Use ssl_client_certificate and ssl_client_private_key for further customization. To use SSL to verify the server's identity, use ssl_enable instead. |
43+
| ssl_mode | False | verify-full | SSL Protection method, see [postgres documentation](https://www.postgresql.org/docs/current/libpq-ssl.html#LIBPQ-SSL-PROTECTION) for more information. Must be one of disable, allow, prefer, require, verify-ca, or verify-full. |
44+
| ssl_certificate_authority | False | ~/.postgresql/root.crl | The certificate authority that should be used to verify the server's identity. Can be provided either as the certificate itself (in .env) or as a filepath to the certificate. |
45+
| ssl_client_certificate | False | ~/.postgresql/postgresql.crt | The certificate that should be used to verify your identity to the server. Can be provided either as the certificate itself (in .env) or as a filepath to the certificate. |
46+
| ssl_client_private_key | False | ~/.postgresql/postgresql.key | The private key for the certificate you provided. Can be provided either as the certificate itself (in .env) or as a filepath to the certificate. |
47+
| ssl_storage_directory | False | .secrets | The folder in which to store SSL certificates provided as raw values. When a certificate/key is provided as a raw value instead of as a filepath, it must be written to a file before it can be used. This configuration option determines where that file is created. |
48+
| ssh_tunnel | False | None | SSH Tunnel Configuration, this is a json object |
49+
| ssh_tunnel.enable | False | 0 | Enable an ssh tunnel (also known as bastion host), see the other ssh_tunnel.* properties for more details |
50+
| ssh_tunnel.host | False | None | Host of the bastion host, this is the host we'll connect to via ssh |
51+
| ssh_tunnel.username | False | None | Username to connect to bastion host |
52+
| ssh_tunnel.port | False | 22 | Port to connect to bastion host |
53+
| ssh_tunnel.private_key | False | None | Private Key for authentication to the bastion host |
54+
| ssh_tunnel.private_key_password | False | None | Private Key Password, leave None if no password is set |
5455

5556
A full list of supported settings and capabilities is available by running: `target-postgres --about`
5657

target_postgres/connector.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,15 @@ def interpret_content_encoding(self) -> bool:
112112
"""
113113
return self.config.get("interpret_content_encoding", False)
114114

115+
@cached_property
116+
def sanitize_null_text_characters(self) -> bool:
117+
"""Whether to sanitize null text characters.
118+
119+
Returns:
120+
True if the feature is enabled, False otherwise.
121+
"""
122+
return self.config.get("sanitize_null_text_characters", False)
123+
115124
def prepare_table( # type: ignore[override] # noqa: PLR0913
116125
self,
117126
full_table_name: str | FullyQualifiedName,

0 commit comments

Comments
 (0)