Skip to content

Commit 1873072

Browse files
authored
feat: add mask segments (#385)
* feat(segment): create empty mask * feat(segment): implement mask * docs(segment): changelog and readme
1 parent f2acba3 commit 1873072

File tree

8 files changed

+242
-4
lines changed

8 files changed

+242
-4
lines changed

CHANGELOG.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,8 @@ Types of changes
1616

1717
## [1.30.0]
1818

19-
- `Added` mask `partition` to handle fields containing different types of values by applying distinct transformations
19+
- `Added` mask `partitions` to handle fields containing different types of values by applying distinct transformations
20+
- `Added` mask `segments` to allow transformations on specific parts of a field's value using regular expressions to capture subgroups
2021

2122
## [1.29.1]
2223

README.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -166,6 +166,7 @@ The following types of masks can be used :
166166
* [`pipe`](#pipe) is a mask to handle complex nested array structures, it can read an array as an object stream and process it with a sub-pipeline.
167167
* [`apply`](#apply) process selected data with a sub-pipeline.
168168
* [`partitions`](#partitions) will rely on conditions to identify specific cases.
169+
* [`segments`](#segments) allow transformations on specific parts of a field's value using regular expressions subgroups captures.
169170
* [`luhn`](#luhn) can generate valid numbers using the Luhn algorithm (e.g. french SIRET or SIREN).
170171
* [`markov`](#markov) can generate pseudo text based on a sample text.
171172
* [`findInCSV`](#findincsv) get one or multiple csv lines which matched with Json entry value from CSV files.
@@ -1099,6 +1100,31 @@ The partition mask will rely on conditions to identify specific cases and apply
10991100

11001101
[Return to list of masks](#possible-masks)
11011102

1103+
### Segments
1104+
1105+
[![Try it](https://img.shields.io/badge/-Try%20it%20in%20PIMO%20Play-brightgreen)](https://cgi-fr.github.io/pimo-play/#c=G4UwTgzglg9gdgLgAQCICMKBQBbAhhAayjgHMFNMkkBaJCEAGxAGMAXGMcq7pAKwngAHXKwAWyFFAAmWHnkJcedECWwg4rCIqVIwKkAA8JAPQAKACgD8pgDxNWrcBAB8AbQCC1AFoBdAN4AzAC+AJRWtlJQJFCabgAM1ACc-sEhACSyOrogggy4zCDaWfaOkEVZNEgAZlVo5RXcBCAAngBiYDDYAKJwwBKtrWgA+l0AcgDCAEoAmqYAKgCSAPKjQwDSXdOZDUpSnbjEEu4AQuMAIl2tAOIAEgsAUmsAMgCyo0umAIqTAMpzAKoANQA6gANaZebZZSLRTT1HS0Gp1Sg7JRNNodbq9fqDEYTGbzZarDZbFGo7h7PCHVBxNAAJgCABYAKwANgA7AAORJYIA&i=N4KABGBECWAmkC4oAUCCAhAwgRgEwGZIQBfIA)
1106+
1107+
The segments mask allow transformations on specific parts of a field's value. This mask will use regular expressions to capture subgroups and apply transformations to them individually. Example configuration:
1108+
1109+
```yaml
1110+
- selector:
1111+
jsonpath: "id"
1112+
mask:
1113+
segments:
1114+
regex: "^P(?P<letters>[A-Z]{3})(?P<digits>[0-9]{3})$"
1115+
replace:
1116+
letters:
1117+
- ff1:
1118+
keyFromEnv: "FF1_ENCRYPTION_KEY"
1119+
domain: "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
1120+
digits:
1121+
- ff1:
1122+
keyFromEnv: "FF1_ENCRYPTION_KEY"
1123+
domain: "0123456789"
1124+
```
1125+
1126+
[Return to list of masks](#possible-masks)
1127+
11021128
### FindInCSV
11031129

11041130
[![Try it](https://img.shields.io/badge/-Try%20it%20in%20PIMO%20Play-brightgreen)](https://cgi-fr.github.io/pimo-play/#c=G4UwTgzglg9gdgLgAQCICMKBQBbAhhAayjgHMFMkkBaJCEAGxAGMAXGMcyrpAKwngAOuFgAtkKYgDMYWbnkIRO3aklwATNUnGzlNScTUBJOAGEAygDUlyrgFcwUcSJYsBigPTuSUCCwB03qK2AEa2dGBM8CwgcP6R2O64YNje9IwQ7mgAnAAswUySkgDMAKwADGVoIADsIMElRbgATLgAHME5OVXtTUwAbO5guADu7llNTRX5Zbh91UVqJUwgTWhoM7jqOSWdIGp9TDmVZZJ9rdXuAjAEINjwfkwQwDo2lCAAHrisALLCTGIUV7cR7AZAAcgA3hCABQGD5IPyoAAqAE8BCAkBgAJRIAA+SHoMGG4CQAF9SWDAUC3rEwCjxFC-Cw0SAAPpockvV48L5MJJqazUkHgxkAOVw2Ax+MJxLAZIpVOpMRYdIZEL8cAlUpl4E51K4ipsH3RrD24mEVEY+BYVHgIC5NhEIHU4GQKtsIENyhVUGwbrAHswQA&i=N4KABGBEAuCeAOBTA+gRkgLigMwJYCdFIAacKAOwEMBbIrSAY0v1vIBNF9IQBfIA)

internal/app/pimo/pimo.go

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,7 @@ import (
5858
"github.com/cgi-fr/pimo/pkg/regex"
5959
"github.com/cgi-fr/pimo/pkg/remove"
6060
"github.com/cgi-fr/pimo/pkg/replacement"
61+
"github.com/cgi-fr/pimo/pkg/segment"
6162
"github.com/cgi-fr/pimo/pkg/sequence"
6263
"github.com/cgi-fr/pimo/pkg/sha3"
6364
"github.com/cgi-fr/pimo/pkg/statistics"
@@ -345,6 +346,7 @@ func injectMaskFactories() []model.MaskFactory {
345346
sha3.Factory,
346347
apply.Factory,
347348
partition.Factory,
349+
segment.Factory,
348350
}
349351
}
350352

pkg/model/model.go

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -247,6 +247,11 @@ type PartitionType struct {
247247
Then []MaskType `yaml:"then" json:"then" jsonschema_description:"list of masks to execute if the condition is active"`
248248
}
249249

250+
type SegmentType struct {
251+
Regex string `yaml:"regex" json:"regex" jsonschema_description:"regex used to create segments using group captures, groups must be named"`
252+
Replace map[string][]MaskType `yaml:"replace" json:"replace" jsonschema_description:"list of masks to execute for each group"`
253+
}
254+
250255
type MaskType struct {
251256
Add Entry `yaml:"add,omitempty" json:"add,omitempty" jsonschema:"oneof_required=Add,title=Add Mask,description=Add a new field in the JSON stream"`
252257
AddTransient Entry `yaml:"add-transient,omitempty" json:"add-transient,omitempty" jsonschema:"oneof_required=AddTransient,title=Add Transient Mask" jsonschema_description:"Add a new temporary field, that will not show in the JSON output"`
@@ -286,7 +291,8 @@ type MaskType struct {
286291
Sequence SequenceType `yaml:"sequence,omitempty" json:"sequence,omitempty" jsonschema:"oneof_required=Sequence,title=Sequence Mask" jsonschema_description:"Generate a sequenced ID that follows specified format"`
287292
Sha3 Sha3Type `yaml:"sha3,omitempty" json:"sha3,omitempty" jsonschema:"oneof_required=Sha3,title=Sha3 Mask" jsonschema_description:"Generate a variable-length crytographic hash (collision resistant)"`
288293
Apply ApplyType `yaml:"apply,omitempty" json:"apply,omitempty" jsonschema:"oneof_required=Apply,title=Apply Mask" jsonschema_description:"Call external masking file"`
289-
Partition []PartitionType `yaml:"partitions,omitempty" json:"partitions,omitempty" jsonschema:"oneof_required=Partition,title=Partition Mask" jsonschema_description:"Identify specific cases and apply a defined list of masks for each case"`
294+
Partition []PartitionType `yaml:"partitions,omitempty" json:"partitions,omitempty" jsonschema:"oneof_required=Partition,title=Partitions Mask" jsonschema_description:"Identify specific cases and apply a defined list of masks for each case"`
295+
Segment SegmentType `yaml:"segments,omitempty" json:"segments,omitempty" jsonschema:"oneof_required=Segment,title=Segments Mask" jsonschema_description:"Allow transformations on specific parts of a field's value"`
290296
}
291297

292298
type Masking struct {

pkg/partition/partition.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ func execPipeline(pipeline model.Pipeline, e model.Entry) (model.Entry, error) {
9494
}
9595

9696
func (me MaskEngine) Mask(e model.Entry, context ...model.Dictionary) (model.Entry, error) {
97-
log.Info().Msg("Mask partition")
97+
log.Info().Msg("Mask partitions")
9898

9999
// exec all partitions
100100
for _, partition := range me.partitions {

pkg/segment/segment.go

Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
package segment
2+
3+
import (
4+
"hash/fnv"
5+
"regexp"
6+
"strings"
7+
tmpl "text/template"
8+
9+
"github.com/cgi-fr/pimo/pkg/model"
10+
"github.com/rs/zerolog/log"
11+
)
12+
13+
type MaskEngine struct {
14+
re *regexp.Regexp
15+
pipelines map[string]model.Pipeline
16+
seed int64
17+
seeder model.Seeder
18+
}
19+
20+
func buildDefinition(masks []model.MaskType, globalSeed int64) model.Definition {
21+
definition := model.Definition{
22+
Version: "1",
23+
Seed: globalSeed,
24+
Functions: nil,
25+
Masking: []model.Masking{},
26+
Caches: nil,
27+
}
28+
29+
for _, mask := range masks {
30+
definition.Masking = append(definition.Masking, model.Masking{
31+
Selector: model.SelectorType{Jsonpath: "."},
32+
Mask: mask,
33+
})
34+
}
35+
36+
return definition
37+
}
38+
39+
// NewMask return a MaskEngine from a value
40+
func NewMask(segment model.SegmentType, caches map[string]model.Cache, fns tmpl.FuncMap, seed int64, seeder model.Seeder, seedField string) (MaskEngine, error) {
41+
var err error
42+
43+
pipelines := map[string]model.Pipeline{}
44+
45+
for groupname, masks := range segment.Replace {
46+
definition := buildDefinition(masks, seed)
47+
pipeline := model.NewPipeline(nil)
48+
pipeline, _, err = model.BuildPipeline(pipeline, definition, caches, fns, "", "")
49+
if err != nil {
50+
return MaskEngine{}, err
51+
}
52+
53+
pipelines[groupname] = pipeline
54+
}
55+
56+
return MaskEngine{
57+
re: regexp.MustCompile(segment.Regex),
58+
pipelines: pipelines,
59+
seed: seed,
60+
seeder: seeder,
61+
}, nil
62+
}
63+
64+
// replace captured groups named in the `value` string using the values ​​calculated by the `replacements` map
65+
func replace(value string, re *regexp.Regexp, replacements map[string]func(string) (string, error)) (string, error) {
66+
result := &strings.Builder{}
67+
68+
matchIndexes := re.FindStringSubmatchIndex(value)
69+
groupNames := re.SubexpNames()
70+
71+
writeCount := 0
72+
for i := 2; i < len(matchIndexes); i += 2 {
73+
groupNumber := i / 2
74+
groupName := groupNames[groupNumber]
75+
startIndex := matchIndexes[i]
76+
endIndex := matchIndexes[i+1]
77+
capturedValue := value[startIndex:endIndex]
78+
79+
result.WriteString(value[writeCount:startIndex])
80+
writeCount = endIndex
81+
82+
if replacement, exists := replacements[groupName]; exists {
83+
if masked, err := replacement(capturedValue); err != nil {
84+
return value, err
85+
} else {
86+
result.WriteString(masked)
87+
}
88+
}
89+
}
90+
result.WriteString(value[writeCount:])
91+
92+
return result.String(), nil
93+
}
94+
95+
func (me MaskEngine) Mask(e model.Entry, context ...model.Dictionary) (model.Entry, error) {
96+
log.Info().Msg("Mask segments")
97+
98+
replacements := map[string]func(string) (string, error){}
99+
100+
for groupname, pipeline := range me.pipelines {
101+
replacements[groupname] = func(match string) (string, error) {
102+
var result []model.Entry
103+
err := pipeline.
104+
WithSource(model.NewSourceFromSlice([]model.Dictionary{model.NewDictionary().With(".", match)})).
105+
AddSink(model.NewSinkToSlice(&result)).
106+
Run()
107+
if err != nil {
108+
return match, err
109+
}
110+
return result[0].(string), nil
111+
}
112+
}
113+
114+
result, err := replace(e.(string), me.re, replacements)
115+
if err != nil {
116+
return e, err
117+
}
118+
119+
return result, nil
120+
}
121+
122+
// Factory create a mask from a configuration
123+
func Factory(conf model.MaskFactoryConfiguration) (model.MaskEngine, bool, error) {
124+
if len(conf.Masking.Mask.Segment.Regex) > 0 {
125+
seeder := model.NewSeeder(conf.Masking.Seed.Field, conf.Seed)
126+
127+
// set differents seeds for differents jsonpath
128+
h := fnv.New64a()
129+
h.Write([]byte(conf.Masking.Selector.Jsonpath))
130+
conf.Seed += int64(h.Sum64()) //nolint:gosec
131+
mask, err := NewMask(conf.Masking.Mask.Segment, conf.Cache, conf.Functions, conf.Seed, seeder, conf.Masking.Seed.Field)
132+
if err != nil {
133+
return mask, true, err
134+
}
135+
return mask, true, nil
136+
}
137+
return nil, false, nil
138+
}

schema/v1/pimo.schema.json

Lines changed: 36 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -590,6 +590,12 @@
590590
"partitions"
591591
],
592592
"title": "Partition"
593+
},
594+
{
595+
"required": [
596+
"segments"
597+
],
598+
"title": "Segment"
593599
}
594600
],
595601
"properties": {
@@ -790,8 +796,13 @@
790796
"$ref": "#/$defs/PartitionType"
791797
},
792798
"type": "array",
793-
"title": "Partition Mask",
799+
"title": "Partitions Mask",
794800
"description": "Identify specific cases and apply a defined list of masks for each case"
801+
},
802+
"segments": {
803+
"$ref": "#/$defs/SegmentType",
804+
"title": "Segments Mask",
805+
"description": "Allow transformations on specific parts of a field's value"
795806
}
796807
},
797808
"additionalProperties": false,
@@ -1030,6 +1041,30 @@
10301041
"additionalProperties": false,
10311042
"type": "object"
10321043
},
1044+
"SegmentType": {
1045+
"properties": {
1046+
"regex": {
1047+
"type": "string",
1048+
"description": "regex used to create segments using group captures, groups must be named"
1049+
},
1050+
"replace": {
1051+
"additionalProperties": {
1052+
"items": {
1053+
"$ref": "#/$defs/MaskType"
1054+
},
1055+
"type": "array"
1056+
},
1057+
"type": "object",
1058+
"description": "list of masks to execute for each group"
1059+
}
1060+
},
1061+
"additionalProperties": false,
1062+
"type": "object",
1063+
"required": [
1064+
"regex",
1065+
"replace"
1066+
]
1067+
},
10331068
"SelectorType": {
10341069
"properties": {
10351070
"jsonpath": {

test/suites/masking_segment.yml

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
name: segment mask
2+
testcases:
3+
- name: simple segmentation test
4+
steps:
5+
- script: |-
6+
cat > masking.yml <<EOF
7+
version: "1"
8+
seed: 42
9+
masking:
10+
- selector:
11+
jsonpath: "id"
12+
mask:
13+
segments:
14+
regex: "^P(?P<letters>[A-Z]{3})(?P<digits>[0-9]{3})$"
15+
replace:
16+
letters:
17+
- ff1:
18+
keyFromEnv: "FF1_ENCRYPTION_KEY"
19+
domain: "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
20+
digits:
21+
- ff1:
22+
keyFromEnv: "FF1_ENCRYPTION_KEY"
23+
domain: "0123456789"
24+
EOF
25+
- script: |-
26+
echo '{"id": "PABC123"}' | FF1_ENCRYPTION_KEY="70NZ2NWAqk9/A21vBPxqlA==" pimo
27+
assertions:
28+
- result.code ShouldEqual 0
29+
- result.systemoutjson.id ShouldEqual PVBR675
30+
- result.systemerr ShouldBeEmpty

0 commit comments

Comments
 (0)