User Tools

Site Tools


software:pdfprocessing

This is an old revision of the document!


Processing sheetmusic

During a period of one year I have experimented a lot with scanning sheetmusic and converting that fast into high quality pdf files. The experiments resulted in following useful software:

  • FlattenANDCompactPDF_A4Size This program is useful for autonomously and accurately converting any pdf file to a compact pdf file containing only black and white bitmaps and fitting exactly to a4. It extensively analyzes all pages within the source file before any processing is done to compensate for possible rasterization rounding errors. This will guarantee that an a4 sized page in a pdf file with a bitmap raster of 2480×3508 pixels will remain the same size. If one or more page(s) are larger, it will be reduced to a4, in such a manner that the largest page determines the scale factor for all pages. Smaller bitmaps are centered on a page, with whitespace around it. So called portrait and landscape sizes are also automatically recognized. The output resolution is per default 300dpi but can be set to any other value. See the code for instructions.
    Another useful purpose of this software is to combine layers within a pdf to a single layer, which happen when annotating has been done into the pdf. An extra layer makes the file larger in size, which can be merged into one layer with this software.
  • pdf2tif_300dpi This program is useful for accurately converting a pdf file to a series of tiff files (one file for each page), at a resolution which is specified in the filename and compensate for possible rasterization rounding errors. The format of the generated tiff files are specially encoded to be used with the program scantailor. (So a fix dpi can be omitted).

Workflow

My experience to scan sheetmusic at a fast rate and convert that into a4 sizes pdf files (ready to print) is:

  1. Acquire a fast and good scanner (I had an Epson GT-10000, later on succeeded with an Epson GT-30000 which was even much faster. The GT-30000 is faster than I can turn pages, so I don't have to wait for that device.)
  2. Use Vuescan to scan automatically each 6 seconds a single A3 tiff page at 300 dpi 8-bit grayscale. Alternatively, if I receive from someone an unprocessed pdf file, I use pdf2tif_300dpi to convert it to single tiff pages with proper resolution and grayscaled.
  3. Use scantailor to rotate, straighten, remove borders, filter and convert to black and white bitmaps. I tested extensively other software like unpaper but I found that scantailor performs way faster and much more accurate then unpaper. Alternatively I have also experimented a lot with using the image processing features of Neuratron Photoscore but found that scantailor is still much faster and gives better handling to fine-tune the output.
  4. Use Adobe Acrobat to assemble all processed tiff pages to one single pdf file. Alternatively this can be done easily with writing a script which uses Imagemagick and Ghostscript to do this, but since my time was limited and could not write yet another program, I had to choose for Acrobat.
  5. Use FlattenANDCompactPDF_A4Size to autonomously resize and/or convert the previously generated pdf file to an a4 pdf file with exactly 2480×3508 (or 3508×2480) pixels with g4 encoding1). The encoding results in pages of average 60 kB.
  • Optional: Use PDF Annotator with a pen tablet2) to write instructions (like bowings) directly into the pdf file. Afterwards use FlattenANDCompactPDF_A4Size a second time to merge these annotations with the music into one bitmap and produce again a compact pdf file.
  • Idea 1: Maybe it would be an idea to modify the FlattenANDCompactPDF_A4Size program to accept apart from pdf also tiff files. Once it sees a sequence of tiff images it will automatically resize them to a4 and convert them into one pdf file. With this option it will combine step 4 and 5 into one step and increase efficiency.
  • Idea 2: I would like to add a feature to the script that it automatically makes sure that total amount of pages are always a multiple of 4 (If the total amount of pages is three or more) . If empty pages turn up at the wrong location, it is easy to move or just delete them. With this feature, booklet printing will be more consistent.
1)
I experienced that other, more efficient types of lossless coding may be better, but will put a much heavier load on the processor while viewing pdf files. This adversely influenced a smooth user experience.
2)
I used a Wacom Intuous3 a5 tablet
software/pdfprocessing.1284472946.txt.gz · Last modified: 2010/09/14 16:02 by admin