Skip to content

Commit f2acba3

Browse files
authored
feat: partition mask (#383)
* feat(partition): add venom test * feat(partition): create empty partition mask * feat(partition): test partition conditions * feat(partition): exec active partition * fix(partition): partitions must be ordered * feat(partition): update docs
1 parent a3e799c commit f2acba3

File tree

7 files changed

+269
-0
lines changed

7 files changed

+269
-0
lines changed

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,10 @@ Types of changes
1414
- `Fixed` for any bug fixes.
1515
- `Security` in case of vulnerabilities.
1616

17+
## [1.30.0]
18+
19+
- `Added` mask `partition` to handle fields containing different types of values by applying distinct transformations
20+
1721
## [1.29.1]
1822

1923
- `Fixed` mock command ignores global seed flag

README.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -165,6 +165,7 @@ The following types of masks can be used :
165165
* [`replacement`](#replacement) is to mask a data with another data from the jsonline.
166166
* [`pipe`](#pipe) is a mask to handle complex nested array structures, it can read an array as an object stream and process it with a sub-pipeline.
167167
* [`apply`](#apply) process selected data with a sub-pipeline.
168+
* [`partitions`](#partitions) will rely on conditions to identify specific cases.
168169
* [`luhn`](#luhn) can generate valid numbers using the Luhn algorithm (e.g. french SIRET or SIREN).
169170
* [`markov`](#markov) can generate pseudo text based on a sample text.
170171
* [`findInCSV`](#findincsv) get one or multiple csv lines which matched with Json entry value from CSV files.
@@ -1069,6 +1070,35 @@ By default, if not specified otherwise, these classes will be used (input -> out
10691070

10701071
[Return to list of masks](#possible-masks)
10711072

1073+
### Partitions
1074+
1075+
[![Try it](https://img.shields.io/badge/-Try%20it%20in%20PIMO%20Play-brightgreen)](https://cgi-fr.github.io/pimo-play/#c=G4UwTgzglg9gdgLgAQCICMKBQBbAhhAayjgHMFNMkkBaJCEAGxAGMAXGMcq7pAKwngAHXKwAWyFLgCuYjlh55CXHkmFhWUDfAjKVNJHFzYQEkHigN5e7gHdRIREgDkAbxcA6abLBIAPkgATEAAzaQZWVBQ-JGwpCFYAJRASEAAPAFkRZlFUAD0AbVxqAC8AXQBqAAFCkoqAHTr3GrLygBIogF8Op0prKjEHXT79Zm1WXDhWCQn4AE9sSoCYczh3UewrPVpDYwkoAM3rO0HnN08ZUQ5ooNCpcMjo2PiklIysnJQCgAZqAE4K9pILo9YZIAaIXqg2ijODxCZTVBfJHIlGHHjbIwmVAwAZgNEqcFDPrQsbw6ZwOYbTBAA&i=N4KABGBECGCuAuALA9gJ0gLjAbXBKApgLbQCWANgAIAmyJpAdgHQDGdkANHhJAIwBMAZgAsAVgBsnblABSyRAzAARZAUh4AuiAC+QA)
1076+
1077+
The partition mask will rely on conditions to identify specific cases and apply a defined list of masks for each case. Example configuration:
1078+
1079+
```yaml
1080+
- selector:
1081+
jsonpath: "ID"
1082+
mask:
1083+
partitions: # only the fist active condition will execute
1084+
- name: case1
1085+
when: '{{ regexMatch "P[A-Z]{3}[0-9]{3}" .ID }}'
1086+
then:
1087+
# List of masks for case 1
1088+
- constant: "this is case 1"
1089+
- name: case2
1090+
when: '{{ regexMatch "G[0-9]{11}" .ID }}'
1091+
then:
1092+
# List of masks for case 2
1093+
- constant: "this is case 2"
1094+
- name: default # case with no "when" condition will always execute
1095+
then:
1096+
# List of masks for unrecognized cases
1097+
- constant: "this is another case"
1098+
```
1099+
1100+
[Return to list of masks](#possible-masks)
1101+
10721102
### FindInCSV
10731103

10741104
[![Try it](https://img.shields.io/badge/-Try%20it%20in%20PIMO%20Play-brightgreen)](https://cgi-fr.github.io/pimo-play/#c=G4UwTgzglg9gdgLgAQCICMKBQBbAhhAayjgHMFMkkBaJCEAGxAGMAXGMcyrpAKwngAOuFgAtkKYgDMYWbnkIRO3aklwATNUnGzlNScTUBJOAGEAygDUlyrgFcwUcSJYsBigPTuSUCCwB03qK2AEa2dGBM8CwgcP6R2O64YNje9IwQ7mgAnAAswUySkgDMAKwADGVoIADsIMElRbgATLgAHME5OVXtTUwAbO5guADu7llNTRX5Zbh91UVqJUwgTWhoM7jqOSWdIGp9TDmVZZJ9rdXuAjAEINjwfkwQwDo2lCAAHrisALLCTGIUV7cR7AZAAcgA3hCABQGD5IPyoAAqAE8BCAkBgAJRIAA+SHoMGG4CQAF9SWDAUC3rEwCjxFC-Cw0SAAPpockvV48L5MJJqazUkHgxkAOVw2Ax+MJxLAZIpVOpMRYdIZEL8cAlUpl4E51K4ipsH3RrD24mEVEY+BYVHgIC5NhEIHU4GQKtsIENyhVUGwbrAHswQA&i=N4KABGBEAuCeAOBTA+gRkgLigMwJYCdFIAacKAOwEMBbIrSAY0v1vIBNF9IQBfIA)

internal/app/pimo/pimo.go

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ import (
4545
"github.com/cgi-fr/pimo/pkg/markov"
4646
"github.com/cgi-fr/pimo/pkg/model"
4747
"github.com/cgi-fr/pimo/pkg/parquet"
48+
"github.com/cgi-fr/pimo/pkg/partition"
4849
"github.com/cgi-fr/pimo/pkg/pipe"
4950
"github.com/cgi-fr/pimo/pkg/randdate"
5051
"github.com/cgi-fr/pimo/pkg/randdura"
@@ -343,6 +344,7 @@ func injectMaskFactories() []model.MaskFactory {
343344
sequence.Factory,
344345
sha3.Factory,
345346
apply.Factory,
347+
partition.Factory,
346348
}
347349
}
348350

pkg/model/model.go

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -241,6 +241,12 @@ type ApplyType struct {
241241
URI string `yaml:"uri" json:"uri" jsonschema_description:"URI of the mask resource"`
242242
}
243243

244+
type PartitionType struct {
245+
Name string `yaml:"name" json:"name" jsonschema_description:"name of the partition"`
246+
When string `yaml:"when,omitempty" json:"when,omitempty" jsonschema_description:"template to execute, if true the condition is active"`
247+
Then []MaskType `yaml:"then" json:"then" jsonschema_description:"list of masks to execute if the condition is active"`
248+
}
249+
244250
type MaskType struct {
245251
Add Entry `yaml:"add,omitempty" json:"add,omitempty" jsonschema:"oneof_required=Add,title=Add Mask,description=Add a new field in the JSON stream"`
246252
AddTransient Entry `yaml:"add-transient,omitempty" json:"add-transient,omitempty" jsonschema:"oneof_required=AddTransient,title=Add Transient Mask" jsonschema_description:"Add a new temporary field, that will not show in the JSON output"`
@@ -280,6 +286,7 @@ type MaskType struct {
280286
Sequence SequenceType `yaml:"sequence,omitempty" json:"sequence,omitempty" jsonschema:"oneof_required=Sequence,title=Sequence Mask" jsonschema_description:"Generate a sequenced ID that follows specified format"`
281287
Sha3 Sha3Type `yaml:"sha3,omitempty" json:"sha3,omitempty" jsonschema:"oneof_required=Sha3,title=Sha3 Mask" jsonschema_description:"Generate a variable-length crytographic hash (collision resistant)"`
282288
Apply ApplyType `yaml:"apply,omitempty" json:"apply,omitempty" jsonschema:"oneof_required=Apply,title=Apply Mask" jsonschema_description:"Call external masking file"`
289+
Partition []PartitionType `yaml:"partitions,omitempty" json:"partitions,omitempty" jsonschema:"oneof_required=Partition,title=Partition Mask" jsonschema_description:"Identify specific cases and apply a defined list of masks for each case"`
283290
}
284291

285292
type Masking struct {

pkg/partition/partition.go

Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
package partition
2+
3+
import (
4+
"bytes"
5+
"hash/fnv"
6+
tmpl "text/template"
7+
8+
"github.com/cgi-fr/pimo/pkg/template"
9+
10+
"github.com/cgi-fr/pimo/pkg/model"
11+
"github.com/rs/zerolog/log"
12+
)
13+
14+
type MaskEngine struct {
15+
partitions []Partition
16+
seed int64
17+
seeder model.Seeder
18+
}
19+
20+
type Partition struct {
21+
name string
22+
when *template.Engine
23+
exec model.Pipeline
24+
}
25+
26+
func buildDefinition(masks []model.MaskType, globalSeed int64) model.Definition {
27+
definition := model.Definition{
28+
Version: "1",
29+
Seed: globalSeed,
30+
Functions: nil,
31+
Masking: []model.Masking{},
32+
Caches: nil,
33+
}
34+
35+
for _, mask := range masks {
36+
definition.Masking = append(definition.Masking, model.Masking{
37+
Selector: model.SelectorType{Jsonpath: "."},
38+
Mask: mask,
39+
})
40+
}
41+
42+
return definition
43+
}
44+
45+
// NewMask return a MaskEngine from a value
46+
func NewMask(partitions []model.PartitionType, caches map[string]model.Cache, fns tmpl.FuncMap, seed int64, seeder model.Seeder, seedField string) (MaskEngine, error) {
47+
parts := []Partition{}
48+
49+
// Build partitions pipelines
50+
for _, partition := range partitions {
51+
template, err := template.NewEngine(partition.When, fns, seed, seedField)
52+
if err != nil {
53+
return MaskEngine{}, err
54+
}
55+
56+
if partition.When == "" {
57+
template = nil
58+
}
59+
60+
definition := buildDefinition(partition.Then, seed)
61+
pipeline := model.NewPipeline(nil)
62+
pipeline, _, err = model.BuildPipeline(pipeline, definition, caches, fns, "", "")
63+
if err != nil {
64+
return MaskEngine{}, err
65+
}
66+
67+
parts = append(parts, Partition{
68+
name: partition.Name,
69+
when: template,
70+
exec: pipeline,
71+
})
72+
}
73+
74+
return MaskEngine{parts, seed, seeder}, nil
75+
}
76+
77+
func execPipeline(pipeline model.Pipeline, e model.Entry) (model.Entry, error) {
78+
var result []model.Entry
79+
80+
err := pipeline.
81+
WithSource(model.NewSourceFromSlice([]model.Dictionary{model.NewDictionary().With(".", e)})).
82+
// Process(model.NewCounterProcessWithCallback("internal", 1, updateContext)).
83+
AddSink(model.NewSinkToSlice(&result)).
84+
Run()
85+
if err != nil {
86+
return nil, err
87+
}
88+
89+
if len(result) == 0 {
90+
return nil, nil
91+
}
92+
93+
return result[0], nil
94+
}
95+
96+
func (me MaskEngine) Mask(e model.Entry, context ...model.Dictionary) (model.Entry, error) {
97+
log.Info().Msg("Mask partition")
98+
99+
// exec all partitions
100+
for _, partition := range me.partitions {
101+
var output bytes.Buffer
102+
103+
if partition.when != nil {
104+
if err := partition.when.Execute(&output, context[0].UnpackUnordered()); err != nil {
105+
return nil, err
106+
}
107+
} else {
108+
output.WriteString("true")
109+
}
110+
111+
if output.String() == "true" {
112+
log.Info().Msgf("Mask partition - executing partition %s", partition.name)
113+
114+
result, err := execPipeline(partition.exec, e)
115+
if err != nil {
116+
return e, err
117+
}
118+
119+
return result, nil
120+
}
121+
}
122+
123+
return e, nil
124+
}
125+
126+
// Factory create a mask from a configuration
127+
func Factory(conf model.MaskFactoryConfiguration) (model.MaskEngine, bool, error) {
128+
if len(conf.Masking.Mask.Partition) > 0 {
129+
seeder := model.NewSeeder(conf.Masking.Seed.Field, conf.Seed)
130+
131+
// set differents seeds for differents jsonpath
132+
h := fnv.New64a()
133+
h.Write([]byte(conf.Masking.Selector.Jsonpath))
134+
conf.Seed += int64(h.Sum64()) //nolint:gosec
135+
mask, err := NewMask(conf.Masking.Mask.Partition, conf.Cache, conf.Functions, conf.Seed, seeder, conf.Masking.Seed.Field)
136+
if err != nil {
137+
return mask, true, err
138+
}
139+
return mask, true, nil
140+
}
141+
return nil, false, nil
142+
}

schema/v1/pimo.schema.json

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -584,6 +584,12 @@
584584
"apply"
585585
],
586586
"title": "Apply"
587+
},
588+
{
589+
"required": [
590+
"partitions"
591+
],
592+
"title": "Partition"
587593
}
588594
],
589595
"properties": {
@@ -778,6 +784,14 @@
778784
"$ref": "#/$defs/ApplyType",
779785
"title": "Apply Mask",
780786
"description": "Call external masking file"
787+
},
788+
"partitions": {
789+
"items": {
790+
"$ref": "#/$defs/PartitionType"
791+
},
792+
"type": "array",
793+
"title": "Partition Mask",
794+
"description": "Identify specific cases and apply a defined list of masks for each case"
781795
}
782796
},
783797
"additionalProperties": false,
@@ -877,6 +891,31 @@
877891
"name"
878892
]
879893
},
894+
"PartitionType": {
895+
"properties": {
896+
"name": {
897+
"type": "string",
898+
"description": "name of the partition"
899+
},
900+
"when": {
901+
"type": "string",
902+
"description": "template to execute, if true the condition is active"
903+
},
904+
"then": {
905+
"items": {
906+
"$ref": "#/$defs/MaskType"
907+
},
908+
"type": "array",
909+
"description": "list of masks to execute if the condition is active"
910+
}
911+
},
912+
"additionalProperties": false,
913+
"type": "object",
914+
"required": [
915+
"name",
916+
"then"
917+
]
918+
},
880919
"PipeType": {
881920
"properties": {
882921
"masking": {

test/suites/masking_partition.yml

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
name: partition mask
2+
testcases:
3+
- name: simple partition with default case
4+
steps:
5+
- script: |-
6+
cat > masking.yml <<EOF
7+
version: "1"
8+
seed: 42
9+
masking:
10+
- selector:
11+
jsonpath: "id"
12+
mask:
13+
partitions:
14+
- name: idrh
15+
when: '[[ .id | default "" | mustRegexMatch "^P[A-Z]{3}[0-9]{3}$" ]]'
16+
then:
17+
- constant: "IDRH"
18+
- name: digits
19+
when: '[[ .id | default "" | mustRegexMatch "^[0-9]+$" ]]'
20+
then:
21+
- constant: "DIGITS"
22+
- name: others
23+
then:
24+
- constant: "OTHER"
25+
EOF
26+
- script: sed -i "s/\[\[/\{\{/g" masking.yml
27+
- script: sed -i "s/\]\]/\}\}/g" masking.yml
28+
- script: |-
29+
pimo <<EOF
30+
{"case": 1, "id": "PZZZ123"}
31+
{"case": 2, "id": "12345"}
32+
{"case": 3, "id": "PABC000"}
33+
{"case": 4, "id": "PABCD000"}
34+
{"case": 5, "id": ""}
35+
{"case": 6, "id": null}
36+
EOF
37+
assertions:
38+
- result.code ShouldEqual 0
39+
- 'result.systemout ShouldContainSubstring {"case":1,"id":"IDRH"}'
40+
- 'result.systemout ShouldContainSubstring {"case":2,"id":"DIGITS"}'
41+
- 'result.systemout ShouldContainSubstring {"case":3,"id":"IDRH"}'
42+
- 'result.systemout ShouldContainSubstring {"case":4,"id":"OTHER"}'
43+
- 'result.systemout ShouldContainSubstring {"case":5,"id":"OTHER"}'
44+
- 'result.systemout ShouldContainSubstring {"case":6,"id":"OTHER"}'
45+
- result.systemerr ShouldBeEmpty

0 commit comments

Comments
 (0)