1
1
# bmap-rs
2
2
3
- The bmap-rs project aims to implement tools related to bmap. The project is written in
4
- rust. The inspiration for it is an existing project that is written in python called
5
- [ bmap-tools] ( https://salsa.debian.org/debian/bmap-tools ) .
3
+ ## Introduction
4
+
5
+ ` bmap-rs ` is a generic tool for copying files using the block map. The idea is that
6
+ large files, like raw system image files, can be copied or flashed a lot faster and
7
+ more reliably with ` bmap-rs ` than with traditional tools, like ` dd ` or ` cp ` . The
8
+ project is written in Rust. The inspiration for it is an existing project that is
9
+ written in Python called [ bmap-tools] ( https://salsa.debian.org/debian/bmap-tools ) .
10
+
11
+ The goal of rewriting it is to be able to create smaller disk images without Python
12
+ dependencies.
6
13
7
14
Right now the implemented function is copying system images files using bmap, which is
8
- safer and faster than regular cp ro dd. That can be used to flash images into block
9
- devices.
15
+ safer and faster than regular cp or dd. It can be used to flash system images into block
16
+ devices, but it can also be used for general image flashing purposes .
10
17
11
18
## Usage
12
19
bmap-rs supports 1 subcommand:
@@ -18,5 +25,181 @@ bmap-rs copy <SOURCE_PATH> <TARGET_PATH>
18
25
The bmap file is automatically searched in the source directory. The recommendation is
19
26
to name it as the source but with bmap extension.
20
27
28
+ ## Concept
29
+
30
+ This section provides general information about the block map (bmap) necessary
31
+ for understanding how ` bmap-rs ` works. The structure of the section is:
32
+
33
+ * "Sparse files" - the bmap ideas are based on sparse files, so it is important
34
+ to understand what sparse files are.
35
+ * "The block map" - explains what bmap is.
36
+ * "Raw images" - the main usage scenario for ` bmap-rs ` is flashing raw images,
37
+ which this section discusses.
38
+ * "Usage scenarios" - describes various possible bmap and ` bmap-rs ` usage
39
+ scenarios.
40
+
41
+ ### Sparse files
42
+
43
+ One of the main roles of a filesystem, generally speaking, is to map blocks of
44
+ file data to disk sectors. Different file-systems do this mapping differently,
45
+ and filesystem performance largely depends on how well the filesystem can do
46
+ the mapping. The filesystem block size is usually 4KiB, but may also be 8KiB or
47
+ larger.
48
+
49
+ Obviously, to implement the mapping, the file-system has to maintain some kind
50
+ of on-disk index. For any file on the file-system, and any offset within the
51
+ file, the index allows you to find the corresponding disk sector, which stores
52
+ the file's data. Whenever we write to a file, the filesystem looks up the index
53
+ and writes to the corresponding disk sectors. Sometimes the filesystem has to
54
+ allocate new disk sectors and update the index (such as when appending data to
55
+ the file). The filesystem index is sometimes referred to as the "filesystem
56
+ metadata".
57
+
58
+ What happens if a file area is not mapped to any disk sectors? Is this
59
+ possible? The answer is yes. It is possible and these unmapped areas are often
60
+ called "holes". And those files which have holes are often called "sparse
61
+ files".
62
+
63
+ All reasonable file-systems like Linux ext[ 234] , btrfs, XFS, or Solaris XFS,
64
+ and even Windows' NTFS, support sparse files. Old and less reasonable
65
+ filesystems, like FAT, do not support holes.
66
+
67
+ Reading holes returns zeroes. Writing to a hole causes the filesystem to
68
+ allocate disk sectors for the corresponding blocks. Here is how you can create
69
+ a 4GiB file with all blocks unmapped, which means that the file consists of a
70
+ huge 4GiB hole:
71
+
72
+ ``` bash
73
+ $ truncate -s 4G image.raw
74
+ $ stat image.raw
75
+ File: image.raw
76
+ Size: 4294967296 Blocks: 0 IO Block: 4096 regular file
77
+ ```
78
+
79
+ Notice that ` image.raw ` is a 4GiB file, which occupies 0 blocks on the disk!
80
+ So, the entire file's contents are not mapped anywhere. Reading this file would
81
+ result in reading 4GiB of zeroes. If you write to the middle of the image.raw
82
+ file, you'll end up with 2 holes and a mapped area in the middle.
83
+
84
+ Therefore:
85
+ * Sparse files are files with holes.
86
+ * Sparse files help save disk space, because, roughly speaking, holes do not
87
+ occupy disk space.
88
+ * A hole is an unmapped area of a file, meaning that it is not mapped anywhere
89
+ on the disk.
90
+ * Reading data from a hole returns zeroes.
91
+ * Writing data to a hole destroys it by forcing the filesystem to map
92
+ corresponding file areas to disk sectors.
93
+ * Filesystems usually operate with blocks, so sizes and offsets of holes are
94
+ aligned to the block boundary.
95
+
96
+ It is also useful to know that you should work with sparse files carefully. It
97
+ is easy to accidentally expand a sparse file, that is, to map all holes to
98
+ zero-filled disk areas. For example, ` scp ` always expands sparse files, the
99
+ ` tar ` and ` rsync ` tools do the same, by default, unless you use the ` --sparse `
100
+ option. Compressing and then decompressing a sparse file usually expands it.
101
+
102
+ There are 2 ioctl's in Linux which allow you to find mapped and unmapped areas:
103
+ ` FIBMAP ` and ` FIEMAP ` . The former is very old and is probably supported by all
104
+ Linux systems, but it is rather limited and requires root privileges. The
105
+ latter is a lot more advanced and does not require root privileges, but it is
106
+ relatively new (added in Linux kernel, version 2.6.28).
107
+
108
+ Recent versions of the Linux kernel (starting from 3.1) also support the
109
+ ` SEEK_HOLE ` and ` SEEK_DATA ` values for the ` whence ` argument of the standard
110
+ ` lseek() ` system call. They allow positioning to the next hole and the next
111
+ mapped area of the file.
112
+
113
+ Advanced Linux filesystems, in modern kernels, also allow "punching holes",
114
+ meaning that it is possible to unmap any aligned area and turn it into a hole.
115
+ This is implemented using the ` FALLOC_FL_PUNCH_HOLE ` ` mode ` of the
116
+ ` fallocate() ` system call.
117
+
118
+ ### The bmap
119
+
120
+ The bmap is an XML file, which contains a list of mapped areas, plus some
121
+ additional information about the file it was created for, for example:
122
+ * SHA256 checksum of the bmap file itself
123
+ * SHA256 checksum of the mapped areas
124
+ * the original file size
125
+ * amount of mapped data
126
+
127
+ The bmap file is designed to be both easily machine-readable and
128
+ human-readable. All the machine-readable information is provided by XML tags.
129
+ The human-oriented information is in XML comments, which explain the meaning of
130
+ XML tags and provide useful information like amount of mapped data in percent
131
+ and in MiB or GiB.
132
+
133
+ ### Raw images
134
+
135
+ Raw images are the simplest type of system images which may be flashed to the
136
+ target block device, block-by-block, without any further processing. Raw images
137
+ just "mirror" the target block device: they usually start with the MBR sector.
138
+ There is a partition table at the beginning of the image and one or more
139
+ partitions containing filesystems, like ext4. Usually, no special tools are
140
+ required to flash a raw image to the target block device.
141
+
142
+ Therefore:
143
+ * raw images are distributed in a compressed form, and they are almost as small
144
+ as a tarball (that includes all the data the image would take)
145
+ * the bmap file and the ` bmap-rs ` make it possible to quickly flash the
146
+ compressed raw image to the target block device
147
+
148
+ And, what is even more important, is that flashing raw images is extremely fast
149
+ because you write directly to the block device, and write sequentially.
150
+
151
+ Another great thing about raw images is that they may be 100% ready-to-go and
152
+ all you need to do is to put the image on your device "as-is". You do not have
153
+ to know the image format, which partitions and filesystems it contains, etc.
154
+ This is simple and robust.
155
+
156
+ ### Usage scenarios
157
+
158
+ Flashing or copying large images is the main ` bmap-rs ` use case. The idea is
159
+ that if you have a raw image file and its bmap, you can flash it to a device by
160
+ writing only the mapped blocks and skipping the unmapped blocks.
161
+
162
+ What this basically means is that with bmap it is not necessary to try to
163
+ minimize the raw image size by making the partitions small, which would require
164
+ resizing them. The image can contain huge multi-gigabyte partitions, just like
165
+ the target device requires. The image will then be a huge sparse file, with
166
+ little mapped data. And because unmapped areas "contain" zeroes, the huge image
167
+ will compress extremely well, so the huge image will be very small in
168
+ compressed form. It can then be distributed in compressed form, and flashed
169
+ very quickly with ` bmap-rs ` and the bmap file, because ` bmap-rs ` will decompress
170
+ the image on-the-fly and write only mapped areas.
171
+
172
+ The additional benefit of using bmap for flashing is the checksum verification.
173
+ Indeed, the ` bmap-rs copy ` command verifies the SHA256 checksums while
174
+ writing. Integrity of the bmap file itself is also protected by a SHA256
175
+ checksum and ` bmap-rs ` verifies it before starting flashing.
176
+
177
+ The second usage scenario is reconstructing sparse files Generally speaking, if
178
+ you had a sparse file but then expanded it, there is no way to reconstruct it.
179
+ In some cases, something like
180
+
181
+ ``` bash
182
+ $ cp --sparse=always expanded.file reconstructed.file
183
+ ```
184
+
185
+ would be enough. However, a file reconstructed this way will not necessarily be
186
+ the same as the original sparse file. The original sparse file could have
187
+ contained mapped blocks filled with all zeroes (not holes), and, in the
188
+ reconstructed file, these blocks will become holes. In some cases, this does
189
+ not matter. For example, if you just want to save disk space. However, for raw
190
+ images, flashing it does matter, because it is essential to write zero-filled
191
+ blocks and not skip them. Indeed, if you do not write the zero-filled block to
192
+ corresponding disk sectors which, presumably, contain garbage, you end up with
193
+ garbage in those blocks. In other words, when we are talking about flashing raw
194
+ images, the difference between zero-filled blocks and holes in the original
195
+ image is essential because zero-filled blocks are the required blocks which are
196
+ expected to contain zeroes, while holes are just unneeded blocks with no
197
+ expectations regarding the contents.
198
+
199
+ ` bmap-rs ` may be helpful for reconstructing sparse files properly. Before the
200
+ sparse file is expanded, you should generate its bmap. Then you may compress
201
+ your file or, otherwise, expand it. Later on, you may reconstruct it using the
202
+ ` bmap-rs copy ` command.
203
+
21
204
## License
22
205
bmap-rs is licensed under dual Apache-2.0 and MIT licenses.
0 commit comments