This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
software:unpaper_test [2009/04/28 03:53] – admin | software:unpaper_test [2015/04/22 21:51] (current) – [findblack] admin | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Unpaper test ===== | + | ====== Unpaper test ====== |
In order to better understand the vast amount of options in unpaper, I've done several tests, which show differences between settings.\\ I scanned a lot of sheetmusic, processed them manually with another application to remove skew. All actions have been done in 8-bit gray at a resolution of 300 dpi.\\ This test should show which settings should be done with unpaper to get a decent automatic conversion to black and white with despeckle and noise removal. | In order to better understand the vast amount of options in unpaper, I've done several tests, which show differences between settings.\\ I scanned a lot of sheetmusic, processed them manually with another application to remove skew. All actions have been done in 8-bit gray at a resolution of 300 dpi.\\ This test should show which settings should be done with unpaper to get a decent automatic conversion to black and white with despeckle and noise removal. | ||
Line 34: | Line 34: | ||
str1=" | str1=" | ||
- | unpaper -b str1donot -t pbm /tmp/$ptmpf-1.pgm | + | unpaper -b str1donot -t pbm ptmpf−1.pgmptmpf.pbm |
# convert inputfile option annotation outputfile | # convert inputfile option annotation outputfile | ||
- | convert | + | convert $ptmpf.pbm -gravity |
- | convert -monochrome -density 300 -units PixelsPerInch | + | convert -monochrome -density 300 -units PixelsPerInch / |
- | rm /tmp/$ptmpf.pbm | + | rm ptmpf.pbmptmpf_ca.pbm |
- | pdflst=" | + | pdflst=" |
nmb=((nmb+1)) | nmb=((nmb+1)) | ||
done | done | ||
Line 83: | Line 83: | ||
===== Scripting with unpaper ===== | ===== Scripting with unpaper ===== | ||
+ | ==== gray2black ==== | ||
Automatic conversion from pdf files, containing gray images, to black and white images with unpaper and its noise reduction and cleaning up features can be done with the following script: | Automatic conversion from pdf files, containing gray images, to black and white images with unpaper and its noise reduction and cleaning up features can be done with the following script: | ||
<code bash> | <code bash> | ||
Line 101: | Line 102: | ||
# | # | ||
# Example: | # Example: | ||
- | # Process gray images in mozart.pdf: gray2black mozart.pdf -p 4 test.pdf | + | # Process gray images in mozart.pdf: gray2black mozart.pdf -b 0.23 test.pdf |
# | # | ||
# This script uses imagemagick to convert and center an image on an | # This script uses imagemagick to convert and center an image on an | ||
Line 114: | Line 115: | ||
# | # | ||
# User text | # User text | ||
- | _fbhelp=" | + | _fbhelp=" |
+ | _fbspdf=" | ||
_fbcvrt=" | _fbcvrt=" | ||
# Unpaper settings, start value for black-threshold is 0.12 | # Unpaper settings, start value for black-threshold is 0.12 | ||
Line 122: | Line 124: | ||
# Other settings | # Other settings | ||
ptmpf=" | ptmpf=" | ||
+ | |||
+ | # Page dimensions | ||
psize=" | psize=" | ||
- | pdim="2480x3508" | + | horpix=2480 |
+ | verpix=3508 | ||
+ | pdim="horpix"x"verpix" | ||
pres=" | pres=" | ||
+ | |||
+ | # Only resize if absolutely neccessary. Values for resizing are for left, right, top and bottom border. | ||
+ | hborder=33 | ||
+ | vborder=33 | ||
+ | lhorpix=$horpix | ||
+ | lverpix=$verpix | ||
+ | |||
# Store input-file name temporarily, | # Store input-file name temporarily, | ||
arg1=$1 | arg1=$1 | ||
+ | |||
# Find number of parameters passed | # Find number of parameters passed | ||
if [ " | if [ " | ||
Line 157: | Line 171: | ||
# only process further if there are two arguments left. | # only process further if there are two arguments left. | ||
if [ " | if [ " | ||
- | if [ ! " | + | if [ ! " |
echo " | echo " | ||
exit 1 | exit 1 | ||
else | else | ||
- | # convert first page to a pgm file with filename | + | # convert first page to a pgm file with filename $ptmpf-*.pgm |
- | pdftoppm -gray -r 300 $arg1 /tmp/$ptmpf | + | pdftoppm -gray -r 300 "$arg1" |
+ | # Check for document resize neccessity. Iterating through all pages, makes sure, all pages get uniform resizing later. | ||
+ | |||
+ | resize=0 | ||
+ | # Iterate through files. We need to sort files in a numerical order because pdftoppm generates files without a leading zero. | ||
+ | # However before the numbers, a ' | ||
+ | for filenum in (lsptmpf-*.pgm | cut -d' | ||
+ | dimension=" | ||
+ | # width: " | ||
+ | # height: " | ||
+ | if [ {dimension%%x*} -gtlhorpix ]; then | ||
+ | lhorpix=${dimension%%x*} | ||
+ | resize=1 | ||
+ | fi | ||
+ | if [ {dimension##*x} -gtlverpix ]; then | ||
+ | lverpix=${dimension## | ||
+ | resize=1 | ||
+ | fi | ||
+ | done | ||
+ | # Resize document if neccessary | ||
+ | if [ $resize -eq 1 ]; then | ||
+ | # image fits within horizontal boundary, so adaptation must be vertical | ||
+ | if [ horpix−eqlhorpix ]; then | ||
+ | vscale=(((verpix-30)*100/ | ||
+ | else | ||
+ | vscale=100 | ||
+ | fi | ||
+ | # image fits within vertical boundary, so adaptation must be horizontal | ||
+ | if [ verpix−eqlverpix ]; then | ||
+ | hscale=(((horpix-30)*100/ | ||
+ | else | ||
+ | hscale=100 | ||
+ | fi | ||
+ | # scaling neccessary both horizontal and vertical | ||
+ | if [ hscale−ltvscale ]; then scale=hscale;elsescale=vscale; | ||
+ | |||
+ | echo Some images exceed the maximum size. Now scaling to " | ||
+ | # iterate through all files and resize all with the same percentage, use 8 bits per pixel | ||
+ | for filenum in (lsptmpf-*.pgm | cut -d' | ||
+ | mogrify -depth 8 -resize " | ||
+ | echo resizing file: " | ||
+ | done | ||
+ | fi | ||
# apply unpaper onto each page | # apply unpaper onto each page | ||
- | for file in $(ls /tmp/$ptmpf-*.pgm); | + | for filenum |
- | unpaper -b bthresholdfilter donot−tpbmfile ${file%.pgm}.pbm | + | # Parameter expansion: ${param%word} From the end remov? Mis je nog iets?es the smallest part of param that matches word and returns the rest. (p.72) |
- | rm "$file" | + | unpaper -b bthresholdfilter $donot -t pbm "$ptmpf-filenum""ptmpf-" |
+ | rm "$ptmpf-$filenum" | ||
done | done | ||
# center pbm page on an a4 canvas, convert to pdf | # center pbm page on an a4 canvas, convert to pdf | ||
pdflst="" | pdflst="" | ||
- | for file in $(ls /tmp/$ptmpf-*.pbm); | + | for filenum |
- | convert -size pdimxc:whitemiff:−|composite−densitypres -units PixelsPerInch -compose atop -gravity Center "$file" - miff:- | convert -monochrome -density 300 -units PixelsPerInch - ps:- | ps2pdf13 -sPAPERSIZE=a4 - "${file%.pbm}.pdf" | + | convert -size pdimxc:whitemiff:−|composite−densitypres -units PixelsPerInch -compose atop -gravity Center "$ptmpf-$filenum" - miff:- | convert -monochrome -density 300 -units PixelsPerInch - ps:- | ps2pdf13 -sPAPERSIZE=a4 |
- | rm "$file" | + | rm "$ptmpf-$filenum" |
- | pdflst=" | + | pdflst=" |
done | done | ||
- | # merge all pages into one file | + | # merge all pages into on page |
- | gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -sOutputFile=2pdflst | + | gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH |
rm $pdflst | rm $pdflst | ||
fi | fi | ||
Line 186: | Line 243: | ||
</ | </ | ||
+ | ==== findblack ==== | ||
Find a value for black threshold with following script: | Find a value for black threshold with following script: | ||
<code bash> | <code bash> | ||
Line 231: | Line 289: | ||
# Default page number | # Default page number | ||
defpage=1 | defpage=1 | ||
- | # Store input-file name temporarily, | + | |
- | arg1=$1 | + | |
# Find number of parameters passed | # Find number of parameters passed | ||
if [ " | if [ " | ||
Line 238: | Line 295: | ||
exit 1 | exit 1 | ||
fi | fi | ||
+ | |||
+ | # Store input-file name temporarily, | ||
+ | arg1=$1 | ||
# Check OPTIONS | # Check OPTIONS | ||
Line 280: | Line 340: | ||
fi | fi | ||
# convert first page to a pgm file with filename tmpfile_PID-1.pgm | # convert first page to a pgm file with filename tmpfile_PID-1.pgm | ||
- | pdftoppm -gray -r 300 -f defpage−ldefpage " | + | pdftoppm -gray -r 300 -f defpage−ldefpage " |
# check for succesful conversion to .pgm | # check for succesful conversion to .pgm | ||
- | if [ ! -f "/tmp/ptmpf−trail$defpage.pgm" | + | if [ ! -f " |
echo " | echo " | ||
- | rm "/tmp/ptmpf−trail$defpage.pgm" | + | rm " |
exit 1 | exit 1 | ||
else | else | ||
Line 300: | Line 360: | ||
# process unpaper | # process unpaper | ||
- | unpaper -b str1donot -t pbm /tmp/ptmpf−trail$defpage.pgm | + | unpaper -b str1donot -t pbm ptmpf−traildefpage.pgmptmpf.pbm |
# Create white A4 size canvas -> center pbm file into canvas -> convert pbm to ps -> convert ps to pdf | # Create white A4 size canvas -> center pbm file into canvas -> convert pbm to ps -> convert ps to pdf | ||
Line 307: | Line 367: | ||
# Sending result to next command with pipe: | | # Sending result to next command with pipe: | | ||
# Using this ' | # Using this ' | ||
- | convert -size pdimxc:whitemiff:−|composite−densitypres -units PixelsPerInch -compose atop -gravity Center | + | convert -size pdimxc:whitemiff:−|composite−densitypres -units PixelsPerInch -compose atop -gravity Center ptmpf.pbm−miff:−|convert−pstr "ptxtstr1" - | convert -monochrome -density 300 -units PixelsPerInch - ps:- | ps2pdf13 -sPAPERSIZE=a4 - " |
- | pdflst=" | + | pdflst=" |
nmb=((nmb+$s)) | nmb=((nmb+$s)) | ||
- | rm | + | rm $ptmpf.pbm |
done | done | ||
- | rm /tmp/ptmpf−trail$defpage.pgm | + | rm ptmpf−trail$defpage.pgm |
# Merge all pdf files in pdflstintoonesinglefilewithfilename2 | # Merge all pdf files in pdflstintoonesinglefilewithfilename2 | ||
gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -sOutputFile=2pdflst | gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -sOutputFile=2pdflst | ||
Line 322: | Line 382: | ||
exit 0 | exit 0 | ||
</ | </ | ||
+ | |||
+ | ===== Bugs ===== | ||
+ | [[software: |