This is an old revision of the document!
In order to better understand the vast amount of options in unpaper, I've done several tests, which show differences between settings.
I scanned a lot of sheetmusic, processed them manually with another application to remove skew. All actions have been done in 8-bit gray at a resolution of 300 dpi.
This test should show which settings should be done with unpaper to get a decent automatic conversion to black and white with despeckle and noise removal.
Automatic conversion of a single pdf file consisting of several pages to pgm files with pdftoppm (result should be 300 dpi and gray):
pdftoppm -gray -r 300 inputfile.pdf outputfile
Check filetype with identify:
identify outputfile
Following script tests unpaper with different settings, produces a pdf with 10 pages, each page with a different setting. The relevant unpaper options are embedded into the file:
#!/bin/sh donot="--no-mask-scan --no-mask-center --no-deskew --no-wipe --no-border --no-border-scan --no-border-align --overwrite" nmb=0 while [ "$nmb" -le 9 ] do str1="-w 0.$nmb $donot -t pbm out.pgm outc.pbm" unpaper $str1 # convert inputfile option annotation outputfile convert outc.pbm -gravity NorthWest -annotate 0 "unpaper $str1" outca.pbm convert -density 300 -units PixelsPerInch outca.pbm ps:- | ps2pdf13 -sPAPERSIZE=a4 - "out-$nmb.pdf" rm outc.pbm rm outca.pbm nmb=$(($nmb+1)) done pdftk out-*.pdf output result.pdf rm out-*.pdf exit 0
unpaper --noisefilter-intensity (1-25) (.pdf) |
---|
option: -ni [1 … 25] Above a value of 10 tiny clusters consisting of about 3×3 (=9 pixels? and thus less than 10 pixels?) are removed |
unpaper –white-treshold (.pdf) |
option: -w [0.0 … 0.9] At option -w 0.6 suddenly a few light gray pixels become invisable. Higher and lower values shows no difference. Strange that this value has no continuity. |
unpaper --black-treshold (coarse) (.pdf) |
option: -b [0.0 … 0.9] Option -b 0.0 turns the slightest light gray spots into black pixels, giving the page large black areas and hard to read. Option -b 0.1 will turn most of the white/gray background color into white and turns remarks, written with light gray pencil into black. Single system lines consist of about 6 pixels high. Option -b 0.2 will make remarks, written with light gray pencil invisable. Darker gray is still visable. Single system lines consist of about 4 pixels high. Option -b 0.3 will make remarks, written with light gray pencil invisable. Darker gray is still visable. Single system lines consist of about 2 pixels high. Option -b 0.4 will make system lines and other less than 3 pixel wide lines disappear almost completely. Printed black strong text is still visable. Normal printed text starts to faint. Option -b 0.5 and higher removes everything that has been previously visable, keeping a white page. (Annotated text remains, because it has been added after unpaper has been executed) |
unpaper --black-treshold 0.10 ... 0.19 (.pdf) |
option: -b [0.10 … 0.19] Option -b 0.12 shows an optimum between light gray pencil remarks and line thickness. Single system lines consist of about 4 pixels high. |
unpaper –black-treshold 0.20 … 0.29 (.pdf) |
option: -b [0.20 … 0.29] Option -b 0.20 printed black symbols (like the alla breve symbol) is just visible as it is originally. Single system lines consist of about 3 pixels high. Stems are 2 pixels wide. Higher settings make all lines gradually become thinner. System lines are getting spots where they are much thinner. At value 0.29 and higher some stems are disintegrating. |
unpaper –black-treshold 0.30 … 0.39 (.pdf) |
option: -b [0.30 … 0.39] At a value of 0.30 symbols like crosses with thin vertical lines are disintegrating. System lines are still continuous and between 1 and 2 pixels. At values higher than 0.34 system lines are largely disintegrating. |
100 pixels size: unpaper --blurfilter-intensity 0.00 ... 0.25 (.pdf) 8 pixels size: unpaper --blurfilter-intensity 0.00 ... 0.25 (.pdf) 4 pixels size: unpaper --blurfilter-intensity 0.00 ... 0.25 (.pdf) |
option: -b 0.12 -ls 100 -lp 50 -li [0.00 … 0.25]: At values larger than 0.07 large square (100×100 pixels) parts are being removed. option: -b 0.12 -ls 8 -lp 4 -li [0.00 … 0.25]: When -li has a value of at least 0.20, tiny isolated dots (dirt), consisting of about 3×3 pixels are being removed. |
Scan your raw material at 300 dpi with gray 8 bit.
With the following command, unpaper converts a gray image into a black and white image:
unpaper -b $black_treshold --no-mask-scan --no-mask-center --no-deskew --no-wipe --no-border --no-border-scan --no-border-align --overwrite -t pbm inputfile.pgm outputfile.pbm
The variable $black_treshold is a ratio between 0 and 1. Assuming light pixels have high values and dark pixels have low values, with this ratio a pixel will be considered black when its value is below the ratio. Therefore a low ratio will yield much darker images.\\ Under normal situations, when the original scan has good visible contrast, $black_treshold should be somewhere within a range from [0.1 … 0.4]. When raw material is quite dark, $black_treshold may be 0.1 higher than usual. If sheet music contains pencil remarks which should be kept in the output result, a value of 0.12 may be useful. To show less pencil remarks, a value of 0.35 may be used.
Black borders at edges of your scan can be removed automatically with this option. See unpaper user documentation for details.
=== 3.