User Tools

Site Tools


software:unpaper_test

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Last revisionBoth sides next revision
software:unpaper_test [2009/05/14 01:13] adminsoftware:unpaper_test [2009/08/22 06:45] admin
Line 1: Line 1:
-====== Unpaper test =====+====== Unpaper test ======
 In order to better understand the vast amount of options in unpaper, I've done several tests, which show differences between settings.\\ I scanned a lot of sheetmusic, processed them manually with another application to remove skew. All actions have been done in 8-bit gray at a resolution of 300 dpi.\\ This test should show which settings should be done with unpaper to get a decent automatic conversion to black and white with despeckle and noise removal. In order to better understand the vast amount of options in unpaper, I've done several tests, which show differences between settings.\\ I scanned a lot of sheetmusic, processed them manually with another application to remove skew. All actions have been done in 8-bit gray at a resolution of 300 dpi.\\ This test should show which settings should be done with unpaper to get a decent automatic conversion to black and white with despeckle and noise removal.
  
Line 83: Line 83:
  
 ===== Scripting with unpaper ===== ===== Scripting with unpaper =====
 +==== gray2black ====
 Automatic conversion from pdf files, containing gray images, to black and white images with unpaper and its noise reduction and cleaning up features can be done with the following script: Automatic conversion from pdf files, containing gray images, to black and white images with unpaper and its noise reduction and cleaning up features can be done with the following script:
 <code bash> <code bash>
Line 101: Line 102:
 # #
 # Example: # Example:
-# Process gray images in mozart.pdf: gray2black mozart.pdf -p 4 test.pdf+# Process gray images in mozart.pdf: gray2black mozart.pdf -b 0.23 test.pdf
  
 # This script uses imagemagick to convert and center an image on an # This script uses imagemagick to convert and center an image on an
Line 132: Line 133:
  
 # Only resize if absolutely neccessary. Values for resizing are for left, right, top and bottom border. # Only resize if absolutely neccessary. Values for resizing are for left, right, top and bottom border.
-resize=0 
 hborder=33 hborder=33
 vborder=33 vborder=33
 lhorpix=$horpix lhorpix=$horpix
 lverpix=$verpix lverpix=$verpix
 +
 +# Store input-file name temporarily, because it's deleted by shift command.
 +arg1=$1
  
 # Find number of parameters passed # Find number of parameters passed
Line 143: Line 146:
  exit 1  exit 1
 fi fi
- 
-# Store input-file name temporarily, because it's deleted by shift command. 
-arg1=$1 
  
 # Check OPTIONS # Check OPTIONS
Line 180: Line 180:
  # Check for document resize neccessity. Iterating through all pages, makes sure, all pages get uniform resizing later.  # Check for document resize neccessity. Iterating through all pages, makes sure, all pages get uniform resizing later.
  
- for file in $(ls --sort=time -r /tmp/$ptmpf-*.pgm); do + resize=0 
- dimension="$(identify -format "%[fx:w]x%[fx:h]" $file)"+ # Iterate through files. We need to sort files in a numerical order because pdftoppm generates files without a leading zero.  
 + # However before the numbers, a '-' delimiter character had been added by pdftoppm. Now cut the part before that. 
 + for filenum in $(ls /tmp/$ptmpf-*.pgm | cut -d'-' -f2 | sort -n); do 
 + dimension="$(identify -format "%[fx:w]x%[fx:h]" /tmp/$ptmpf'-'$filenum)"
  #  width: "${dimension%%x*}" -> From the end removes the longest part of dimension that matches x* and returns the rest.  #  width: "${dimension%%x*}" -> From the end removes the longest part of dimension that matches x* and returns the rest.
  #  height: "${dimension##*x}" -> From the beginning removes the longest part of dimension that matches *x and returns the rest.  #  height: "${dimension##*x}" -> From the beginning removes the longest part of dimension that matches *x and returns the rest.
Line 193: Line 196:
  fi  fi
  done  done
- 
  # Resize document if neccessary  # Resize document if neccessary
  if [ $resize -eq 1 ]; then  if [ $resize -eq 1 ]; then
Line 213: Line 215:
  echo Some images exceed the maximum size. Now scaling to "$scale"%.  echo Some images exceed the maximum size. Now scaling to "$scale"%.
  # iterate through all files and resize all with the same percentage, use 8 bits per pixel  # iterate through all files and resize all with the same percentage, use 8 bits per pixel
- for file in $(ls --sort=time -r /tmp/$ptmpf-*.pgm); do + for filenum in $(ls /tmp/$ptmpf-*.pgm | cut -d'-' -f2 | sort -n); do 
- mogrify -depth 8 -resize "$scale"% $file + mogrify -depth 8 -resize "$scale""/tmp/$ptmpf-$filenum" 
- echo resizing file: "$file"+ echo resizing file: "/tmp/$ptmpf-$filenum"
  done  done
  fi  fi
  # apply unpaper onto each page  # apply unpaper onto each page
- for file in $(ls --sort=time -r /tmp/$ptmpf-*.pgm); do + for filenum in $(ls /tmp/$ptmpf-*.pgm | cut -d'-' -f2 | sort -n); do 
- # Parameter expansion: ${param%word} From the end removes the smallest part of param that matches word and returns the rest. (p.72) + # Parameter expansion: ${param%word} From the end remov? Mis je nog iets?es the smallest part of param that matches word and returns the rest. (p.72) 
- unpaper -b $b_threshold $filter $donot -t pbm $file ${file%.pgm}.pbm + unpaper -b $b_threshold $filter $donot -t pbm "/tmp/$ptmpf-$filenum" "/tmp/$ptmpf-"${filenum%.pgm}.pbm 
- rm "$file"+ rm "/tmp/$ptmpf-$filenum"
  done  done
  
  # center pbm page on an a4 canvas, convert to pdf  # center pbm page on an a4 canvas, convert to pdf
  pdflst=""  pdflst=""
- for file in $(ls --sort=time -r /tmp/$ptmpf-*.pbm); do + for filenum in $(ls /tmp/$ptmpf-*.pbm | cut -d'-' -f2 | sort -n); do 
- convert -size $pdim xc:white miff:- | composite -density $pres -units PixelsPerInch -compose atop -gravity Center "$file" - miff:- | convert -monochrome -density 300 -units PixelsPerInch - ps:- | ps2pdf13 -sPAPERSIZE=a4 - "${file%.pbm}.pdf" + convert -size $pdim xc:white miff:- | composite -density $pres -units PixelsPerInch -compose atop -gravity Center "/tmp/$ptmpf-$filenum" - miff:- | convert -monochrome -density 300 -units PixelsPerInch - ps:- | ps2pdf13 -sPAPERSIZE=a4 - "/tmp/$ptmpf-"${filenum%.pbm}.pdf 
- rm "$file+ rm "/tmp/$ptmpf-$filenum
- pdflst="$pdflst ${file%.pbm}.pdf"+ pdflst="$pdflst /tmp/$ptmpf-"${filenum%.pbm}.pdf
  done  done
  
Line 241: Line 243:
 </code> </code>
  
 +==== findblack ====
 Find a value for black threshold with following script: Find a value for black threshold with following script:
 <code bash> <code bash>
software/unpaper_test.txt · Last modified: 2015/04/22 21:51 by admin