User Tools

Site Tools


software:unpaper_test

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
software:unpaper_test [2009/04/28 04:59] adminsoftware:unpaper_test [2015/04/22 21:51] (current) – [findblack] admin
Line 1: Line 1:
-====== Unpaper test =====+====== Unpaper test ======
 In order to better understand the vast amount of options in unpaper, I've done several tests, which show differences between settings.\\ I scanned a lot of sheetmusic, processed them manually with another application to remove skew. All actions have been done in 8-bit gray at a resolution of 300 dpi.\\ This test should show which settings should be done with unpaper to get a decent automatic conversion to black and white with despeckle and noise removal. In order to better understand the vast amount of options in unpaper, I've done several tests, which show differences between settings.\\ I scanned a lot of sheetmusic, processed them manually with another application to remove skew. All actions have been done in 8-bit gray at a resolution of 300 dpi.\\ This test should show which settings should be done with unpaper to get a decent automatic conversion to black and white with despeckle and noise removal.
  
Line 34: Line 34:
  str1="0.$nmb"  str1="0.$nmb"
  
- unpaper -b $str1 $donot -t pbm /tmp/$ptmpf-1.pgm /tmp/$ptmpf.pbm+ unpaper -b $str1 $donot -t pbm $ptmpf-1.pgm $ptmpf.pbm
  
  # convert inputfile option annotation outputfile  # convert inputfile option annotation outputfile
- convert /tmp/$ptmpf.pbm -gravity NorthWest -annotate 0 "unpaper -$str1 $donot" /tmp/$ptmpf_ca.pbm + convert $ptmpf.pbm -gravity center -background lightblue -font Open-Sans -pointsize 60 caption:"$str1 \n $donot" -composite $ptmpf_ca.pbm 
- convert -monochrome -density 300 -units PixelsPerInch /tmp/$ptmpf_ca.pbm ps:- | ps2pdf13 -sPAPERSIZE=a4 - "/tmp/$ptmpf-$nmb.pdf" + convert -monochrome -density 300 -units PixelsPerInch /$ptmpf_ca.pbm ps:- | ps2pdf13 -sPAPERSIZE=a4 - "$ptmpf-$nmb.pdf" 
- rm /tmp/$ptmpf.pbm /tmp/$ptmpf_ca.pbm + rm $ptmpf.pbm $ptmpf_ca.pbm 
- pdflst="$pdflst /tmp/$ptmpf-$nmb.pdf"+ pdflst="$pdflst $ptmpf-$nmb.pdf"
  nmb=$(($nmb+1))  nmb=$(($nmb+1))
 done done
Line 83: Line 83:
  
 ===== Scripting with unpaper ===== ===== Scripting with unpaper =====
 +==== gray2black ====
 Automatic conversion from pdf files, containing gray images, to black and white images with unpaper and its noise reduction and cleaning up features can be done with the following script: Automatic conversion from pdf files, containing gray images, to black and white images with unpaper and its noise reduction and cleaning up features can be done with the following script:
 <code bash> <code bash>
Line 101: Line 102:
 # #
 # Example: # Example:
-# Process gray images in mozart.pdf: gray2black mozart.pdf -p 4 test.pdf+# Process gray images in mozart.pdf: gray2black mozart.pdf -b 0.23 test.pdf
  
 # This script uses imagemagick to convert and center an image on an # This script uses imagemagick to convert and center an image on an
Line 114: Line 115:
 # #
 # User text  # User text 
-_fbhelp="Usage: gray2black input-file [OPTION...] output-file\n\n  -b [value], specify black threshold value as being used in\n              unpaper with -b. If omitted, default value will\n              be used: -b 0.12.\n\nExample:\nProcess pdf file with unpaper black threshold 0.23:\n  gray2black mozart.pdf -b 0.23 result.pdf\nProcess pdf file with -b 0.12 settings:\n  gray2black mozart.pdf result.pdf"_fbspdf="please supply a pdf file"+_fbhelp="Usage: gray2black input-file [OPTION...] output-file\n\n  -b [value], specify black threshold value as being used in\n              unpaper with -b. If omitted, default value will\n              be used: -b 0.12.\n\nExample:\nProcess pdf file with unpaper black threshold 0.23:\n  gray2black mozart.pdf -b 0.23 result.pdf\nProcess pdf file with -b 0.12 settings:\n  gray2black mozart.pdf result.pdf" 
 +_fbspdf="please supply a pdf file"
 _fbcvrt="Check your pdf file. Does it contain gray images?" _fbcvrt="Check your pdf file. Does it contain gray images?"
 # Unpaper settings, start value for black-threshold is 0.12 # Unpaper settings, start value for black-threshold is 0.12
Line 122: Line 124:
 # Other settings # Other settings
 ptmpf="tempfile_gray2black_$$" ptmpf="tempfile_gray2black_$$"
 +
 +# Page dimensions
 psize="a4" psize="a4"
-pdim="2480x3508"+horpix=2480 
 +verpix=3508 
 +pdim="$horpix"x"$verpix"
 pres="300" pres="300"
 +
 +# Only resize if absolutely neccessary. Values for resizing are for left, right, top and bottom border.
 +hborder=33
 +vborder=33
 +lhorpix=$horpix
 +lverpix=$verpix
 +
 # Store input-file name temporarily, because it's deleted by shift command. # Store input-file name temporarily, because it's deleted by shift command.
 arg1=$1 arg1=$1
 +
 # Find number of parameters passed # Find number of parameters passed
 if [ "$#" -le 1 ]; then if [ "$#" -le 1 ]; then
Line 161: Line 175:
  exit 1  exit 1
  else  else
- # convert first page to a pgm file with filename /tmp/$ptmpf-*.pgm + # convert first page to a pgm file with filename $ptmpf-*.pgm 
- pdftoppm -gray -r 300 "$arg1" /tmp/$ptmpf+ pdftoppm -gray -r 300 "$arg1" $ptmpf
  
 + # Check for document resize neccessity. Iterating through all pages, makes sure, all pages get uniform resizing later.
 +
 + resize=0
 + # Iterate through files. We need to sort files in a numerical order because pdftoppm generates files without a leading zero. 
 + # However before the numbers, a '-' delimiter character had been added by pdftoppm. Now cut the part before that.
 + for filenum in $(ls $ptmpf-*.pgm | cut -d'-' -f2 | sort -n); do
 + dimension="$(identify -format "%[fx:w]x%[fx:h]" $ptmpf'-'$filenum)"
 + #  width: "${dimension%%x*}" -> From the end removes the longest part of dimension that matches x* and returns the rest.
 + #  height: "${dimension##*x}" -> From the beginning removes the longest part of dimension that matches *x and returns the rest.
 + if [ ${dimension%%x*} -gt $lhorpix ]; then
 + lhorpix=${dimension%%x*}
 + resize=1
 + fi
 + if [ ${dimension##*x} -gt $lverpix ]; then
 + lverpix=${dimension##*x}
 + resize=1
 + fi
 + done
 + # Resize document if neccessary
 + if [ $resize -eq 1 ]; then
 + # image fits within horizontal boundary, so adaptation must be vertical
 + if [ $horpix -eq $lhorpix ]; then
 + vscale=$(( ($verpix-30)*100/$lverpix ))
 + else
 + vscale=100
 + fi
 + # image fits within vertical boundary, so adaptation must be horizontal
 + if [ $verpix -eq $lverpix ]; then
 + hscale=$(( ($horpix-30)*100/$lhorpix ))
 + else
 + hscale=100
 + fi
 + # scaling neccessary both horizontal and vertical
 + if [ $hscale -lt $vscale ]; then scale=$hscale; else scale=$vscale; fi
 +
 + echo Some images exceed the maximum size. Now scaling to "$scale"%.
 + # iterate through all files and resize all with the same percentage, use 8 bits per pixel
 + for filenum in $(ls $ptmpf-*.pgm | cut -d'-' -f2 | sort -n); do
 + mogrify -depth 8 -resize "$scale"% "$ptmpf-$filenum"
 + echo resizing file: "$ptmpf-$filenum"
 + done
 + fi
  # apply unpaper onto each page  # apply unpaper onto each page
- for file in $(ls --sort=time -r /tmp/$ptmpf-*.pgm); do + for filenum in $(ls $ptmpf-*.pgm | cut -d'-' -f2 | sort -n); do 
- unpaper -b $b_threshold $filter $donot -t pbm $file ${file%.pgm}.pbm + # Parameter expansion: ${param%word} From the end remov? Mis je nog iets?es the smallest part of param that matches word and returns the rest. (p.72) 
- rm "$file"+ unpaper -b $b_threshold $filter $donot -t pbm "$ptmpf-$filenum" "$ptmpf-"${filenum%.pgm}.pbm 
 + rm "$ptmpf-$filenum"
  done  done
  
  # center pbm page on an a4 canvas, convert to pdf  # center pbm page on an a4 canvas, convert to pdf
  pdflst=""  pdflst=""
- for file in $(ls --sort=time -r /tmp/$ptmpf-*.pbm); do + for filenum in $(ls $ptmpf-*.pbm | cut -d'-' -f2 | sort -n); do 
- convert -size $pdim xc:white miff:- | composite -density $pres -units PixelsPerInch -compose atop -gravity Center "$file" - miff:- | convert -monochrome -density 300 -units PixelsPerInch - ps:- | ps2pdf13 -sPAPERSIZE=a4 - "${file%.pbm}.pdf" + convert -size $pdim xc:white miff:- | composite -density $pres -units PixelsPerInch -compose atop -gravity Center "$ptmpf-$filenum" - miff:- | convert -monochrome -density 300 -units PixelsPerInch - ps:- | ps2pdf13 -sPAPERSIZE=a4 - "$ptmpf-"${filenum%.pbm}.pdf 
- rm "$file+ rm "$ptmpf-$filenum
- pdflst="$pdflst ${file%.pbm}.pdf"+ pdflst="$pdflst $ptmpf-"${filenum%.pbm}.pdf
  done  done
  
- # merge all pages into one file + # merge all pages into on page 
- gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -sOutputFile=$2 $pdflst+ gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH "-sOutputFile=$2$pdflst
  rm $pdflst  rm $pdflst
  fi  fi
Line 186: Line 243:
 </code> </code>
  
 +==== findblack ====
 Find a value for black threshold with following script: Find a value for black threshold with following script:
 <code bash> <code bash>
Line 231: Line 289:
 # Default page number # Default page number
 defpage=1 defpage=1
-# Store input-file name temporarily, because it's deleted by shift command. +
-arg1=$1+
 # Find number of parameters passed # Find number of parameters passed
 if [ "$#" -le 1 ]; then if [ "$#" -le 1 ]; then
Line 238: Line 295:
  exit 1  exit 1
 fi fi
 +
 +# Store input-file name temporarily, because it's deleted by shift command.
 +arg1=$1
  
 # Check OPTIONS # Check OPTIONS
Line 280: Line 340:
  fi  fi
  # convert first page to a pgm file with filename tmpfile_PID-1.pgm  # convert first page to a pgm file with filename tmpfile_PID-1.pgm
- pdftoppm -gray -r 300 -f $defpage -l $defpage "$arg1" /tmp/$ptmpf+ pdftoppm -gray -r 300 -f $defpage -l $defpage "$arg1" $ptmpf
  # check for succesful conversion to .pgm  # check for succesful conversion to .pgm
- if [ ! -f "/tmp/$ptmpf-$trail$defpage.pgm" ]; then+ if [ ! -f "$ptmpf-$trail$defpage.pgm" ]; then
  echo "$_fbcvrt"  echo "$_fbcvrt"
- rm "/tmp/$ptmpf-$trail$defpage.pgm"+ rm "$ptmpf-$trail$defpage.pgm"
  exit 1  exit 1
  else  else
Line 300: Line 360:
    
  # process unpaper  # process unpaper
- unpaper -b $str1 $donot -t pbm /tmp/$ptmpf-$trail$defpage.pgm /tmp/$ptmpf.pbm+ unpaper -b $str1 $donot -t pbm $ptmpf-$trail$defpage.pgm $ptmpf.pbm
    
  # Create white A4 size canvas -> center pbm file into canvas -> convert pbm to ps -> convert ps to pdf  # Create white A4 size canvas -> center pbm file into canvas -> convert pbm to ps -> convert ps to pdf
Line 307: Line 367:
  # Sending result to next command with pipe: |  # Sending result to next command with pipe: |
  # Using this 'technique' we don't need to create temporary files.  # Using this 'technique' we don't need to create temporary files.
- convert -size $pdim xc:white miff:- | composite -density $pres -units PixelsPerInch -compose atop -gravity Center /tmp/$ptmpf.pbm - miff:- | convert - $pstr "$ptxt $str1" - | convert -monochrome -density 300 -units PixelsPerInch - ps:- | ps2pdf13 -sPAPERSIZE=a4 - "/tmp/$ptmpf-$nmb.pdf"+ convert -size $pdim xc:white miff:- | composite -density $pres -units PixelsPerInch -compose atop -gravity Center $ptmpf.pbm - miff:- | convert - $pstr "$ptxt $str1" - | convert -monochrome -density 300 -units PixelsPerInch - ps:- | ps2pdf13 -sPAPERSIZE=a4 - "$ptmpf-$nmb.pdf"
    
- pdflst="$pdflst /tmp/$ptmpf-$nmb.pdf"+ pdflst="$pdflst $ptmpf-$nmb.pdf"
  nmb=$(($nmb+$s))  nmb=$(($nmb+$s))
- rm /tmp/$ptmpf.pbm+ rm $ptmpf.pbm
  done  done
- rm /tmp/$ptmpf-$trail$defpage.pgm+ rm $ptmpf-$trail$defpage.pgm
  # Merge all pdf files in $pdflst into one single file with filename $2  # Merge all pdf files in $pdflst into one single file with filename $2
  gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -sOutputFile=$2 $pdflst  gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -sOutputFile=$2 $pdflst
Line 322: Line 382:
 exit 0 exit 0
 </code> </code>
 +
 +===== Bugs =====
 +[[software:unpaper_test:bugs|See the following page]]
software/unpaper_test.1240887584.txt.gz · Last modified: 2009/04/28 04:59 by admin