Information Papers index page

nof-logo

nof-digitise Technical Advisory Service

The Digitisation Process

Commissioned from HEDS by UKOLN on behalf of NOF in association with the People's Network

An Information Paper from the NOF Technical Advisory Service

The nof-digitise Technical Standards and Guidelines [1] set out how digitisation should be undertaken in NOF funded projects. This document looks at the essential issues of the digitisation process that should be addressed during the project planning stages and discusses techniques for creating digital files that will conform to the guidelines.

This document is intended to be used as a means of focusing attention upon the key issues associated with the digitisation process. The advice that it contains is intended as guidance rather than as the only solution to these issues. There may be valid institutional or curatorial reasons for following or discarding different aspects of this guidance, especially in relation to the handling of original materials that may make certain processes unsuitable for that class of material.

The fundamental issues associated with the digitisation process are as follows:

Know your originals

Having a good knowledge of the contents of the collections that are intended to be digitised will make it much easier to decide on processes and techniques for converting the originals to digital form.

The physical processes required to create a digitised version of an original item depend on many factors, including:

  • The format of the original - is it printed text, photographic material, video, audio etc.?
  • The condition of the original – will it stand up to automated procedures (if used), will conservation be required before scanning?
  • The size of the original
  • The colour content of the original and whether that colour is important.

For paper and photographic originals, issues to consider include the following:

Photographic media (transparencies, prints, negatives)

What size are the originals, are they all the same size?

  • It makes for a smoother workflow if items of a similar size are grouped together.

What proportion of the items have colour content? Is it important to capture the colour?

What condition are they in, for example, are they dirty from heavy use?

  • If they are dirty a better scan will be achieved if the items can be cleaned first.

What format are they in?

  • Slides in sleeves or strips will take longer to prepare for scanning and may cost more if a bureau is scanning them.
  • Glass negatives are prone to breakage and require careful handling.

Are the photographs flat or have they bowed?

  • Bowed originals cause difficulties with focus and may need weighting down.

What is the quality of the original?

  • A bad original (i.e. out of focus) will not be improved by scanning.

Paper media

What size are the pages, are all items the same size?

What general condition is the material in?

  • Pristine pages will produce a better result and the scanning process may be able to be automated. Any damage in an original may be exacerbated by the scanning procedure.

Can books that are bound be stripped to loose pages for scanning?

  • Scanning from bound volumes is more complex and therefore expensive than from loose pages.

Is there any artwork? – is it black and white or colour photographs or line art?

  • Colour scanning is generally more complex and resource intensive.

Is the text size particularly small or large?

  • Very small text may need a higher resolution to extract the information.

Objects require a different approach. Artifacts, art works and sculptures cannot generally be successfully scanned using the techniques available for ‘flat’ media such as photographs. It will therefore be necessary to use photography, either traditional or digital, to get an image of the original.

How much will it cost?

Guidance on the relative cost of various procedures is available from HEDS [2] and [3]. There is also a detailed review in DigiNews [4] by Steven Puglia, of the National Archives and Records Administration.

Remember that real costs and prices are bound to vary from those given in any guidance documents and it is essential that such guidance is used purely as a starting point in accurate costing.

A brief technical overview: creating a digital master

The nof-digitise Technical Standards and Guidelines for creating digitised content recommend that digital preservation issues be observed when producing digital content. A good baseline to creating a digital file that will be long-lasting would be Scan Once for All Purposes  – this means that all the complex and expensive preparation work will only need to be done once.

The guidelines recommend that projects consider the value in creating a fully documented high-quality 'digital master' from which all other versions (e.g. compressed versions for accessing via the Web) can be derived. This 'digital master' file should be created at the highest suitable resolution and bit depth that is both affordable and practical. This master file then becomes the source for every other version of that item that the project will require, such as Web surrogates, versions for high quality printing and so on.

The 'digital master' file will become an archive version of the data – it remains as pure a representation of the original as possible. Ideally more than one copy should be stored on more than one media type and in more than one geographical location, thus providing a degree of protection against data corruption, media failure and physical damage to equipment.

'Surrogate' or 'access' versions of the digitised item can be created from the 'digital master' using image manipulation software such as Adobe Photoshop or Paintshop Pro. The nof-digitise Technical Standards and Guidelines give advice on the file formats that are acceptable for the access versions.

The CEDARS project [5] gives in-depth information about digital preservation and links to further reading and the National Preservation Office have published Digital Culture: maximising the nation's investment [6], for which a synopsis and details of how to obtain a copy are available.

Resolution and bit-depth

Resolution is usually expressed in dots per inch (DPI) and relates to the density of information that is captured by the scanning equipment. Broadly speaking, the higher the DPI the more detail is being captured. The amount of resolution required to get a useful image of an item is determined by the size of the original, the amount of detail in the original and the eventual use for the data. For example, a 35mm transparency will require a higher DPI than a 5x4 print because it is smaller and more detailed. An A4 sized modern printed document that is intended to be processed into a searchable text will need less resolution than a similar sized photographic original. There are also upward limits on resolution – file size is one (increasing resolution will increase the file size) and another is preventing the capture of extraneous information. For example, postcards are often printed on poor quality paper and if they are scanned at too high a resolution the texture of the paper will be captured and can obscure the content. There is also a point where putting more resolution into the capture process will no longer add value to the information content of the digital output.

Suitable resolutions for digital master files for various media types are discussed in the HEDS Matrix [3] and the JIDI Feasibility Study [7] contains a useful table of baseline standards of minimum values of resolutions according to original material type.

Bit-depth relates to the level of colour that will be captured. A 'bit' is the binary digit that represents the tonal value of the pixel. As an overview, a 1-bit image is black and white (the pixel has 1 bit and is therefore black or white with no shades in between), an 8-bit image has 256 shades of either grey or colour (28 = 256 shades), and a 24-bit image has millions of shades of colour (224 = 16,777,216 shades).

A detailed discussion of resolution, binary and bit depth can be found on TASI's Web pages [8] and a good basic guide to colour capture can also be found on the EPIcentre Web pages [9].

Choosing scanning equipment

Digitisation equipment can be separated into 'contact' and 'no-contact'. 'Contact' equipment, i.e. flatbed scanners, requires that the original be flat against the scanbed to get a scanned image. This approach will only work if your original is flat or can be pressed flat without damage to it.

No-contact equipment includes overhead scanners or book scanners and digital cameras that are able to obtain a digital image with the bare minimum of contact with the original.

Choosing the equipment for scanning your originals will depend largely on the characteristics of the collection: in general terms, photographic materials are usually scanned on a flatbed or a transparency scanner while bound volumes and oversized flat materials such as maps and plans require a digital camera or an overhead scanner.

The Feasibility Study for the JIDI project [7] gives information about the type of equipment that is most suitable for broad groups of media types.

If you have a mixed media collection then it may not be possible to use one scanner for everything. A flatbed that is ideal for high speed, high volume paper scanning may not be capable of the resolution required for high quality scans of transparencies. A digital camera studio set-up will be overkill for loose leaf paper scanning and for most general photographic materials.

Generally, make sure that your requirements match the capability of the scanner(s) that you buy. Look carefully at the resolution that the scanner is capable of, the scanner will often be listed with a maximum optical resolution and an interpolated or software resolution. The optical resolution is the figure to look for – interpolated resolution uses software to 'guess' the values of pixels that are between those that the scanner can optically register. Interpolation should be avoided in an archive-quality scanning exercise. Where resolution is listed as, for example, 600x1200 DPI the maximum optical resolution will be 600.

The dynamic range of the scanner is important – it describes the tonal density of the information that the scanner will be able to capture and generally speaking the higher this is the better, particularly for dense originals such as photographic prints and transparencies.

A good flatbed scanner is often the keystone to a scanning unit. They range in price from tens to thousands of pounds – if this equipment is the key to the success of your project then investing in a good flatbed is essential.

Production-level flatbed scanners usually have either an A4 or an A3 sized scanning area. Larger ones are available but are specialist equipment and therefore rather expensive. In order to choose a flatbed you need to know what size your originals are, whether they are reflective (i.e. light is bounced off them to capture the image, as in photographic prints) or transmissive (light is passed through the original to capture the image, as in transparencies), the resolution and bit depth you will be capturing and the volume of the work.

The software that runs the scanner is also important. It should be straightforward to use and an ability to run batch scans will save time as the scan bed can be loaded with originals and more or less left to get on with it.

The Digital Eyes Web site [10] lists flatbeds by suitability and price.

Colour management software is essential to ensure that the digital representation is as accurate as possible. This can often be purchased with the scanner. RLG DigiNews December 1997 (Vol 3 number 3) has a technical review of colour management software [11] which is a good starting point.

Transparencies can be scanned on a flatbed if it is capable of sufficient resolution and has a transparency adapter fitted that will shine light through the transparency into the scanning head. However, faster and potentially better results will be gained from a dedicated transparency scanner. These scan strips or mounted 35mm negative or positive transparencies to high resolutions. Scanning un-mounted strips or single frame transparencies on a flatbed is difficult and time consuming because they have to be either placed in holders or taped to the scan bed to stop them moving in the heat of the light - using a transparency scanner can alleviate some of this effort and would be a good investment if 35mm is a considerable part of the collection.

Digital cameras. Digital cameras are developing for both the home and professional market and are priced from several hundred to thousands of pounds. 'Home use' cameras are aimed at non-professional users for taking general casual photography. Listings of home use cameras and their comparative features can be found on the Imaging Resource Web site [12].

There are two kinds of professional digital camera; the first has developed from medical and industrial uses and is a complete unit. The second is where the film from a traditional camera is replaced with computer sensors which transmit the image to a computer rather than to film; this is known as a digital scanning back. The first type has been around for longer and has been used in imaging projects for several years. Digital scanning backs are developing for professional photographers as a replacement for traditional film cameras, although they are also being used in project work. One of the advantages of the scanning backs is that they use the lenses and camera body of a traditional professional camera.

Professional digital camera set-ups will generally require the operator to understand the basics of photography and this is a cost that projects need to consider.

The EPIcentre Web site has reviews and feature comparisons of professional level digital cameras [13]. TASI also has a section on digital cameras [14].

Set up an in-house scanning unit or use a bureau?

The conversion of the materials can be done either in-house on specially purchased or existing equipment or sent to an external agency or commercial bureau.

Setting up a digitisation unit gives the institution the value of equipment and trained staff for future projects and the movement and treatment of the materials can be closely controlled. Using an external supplier to do the scanning means that the equipment and expertise of a third party can be exploited while the project team concentrates on their specialist area of the project. Using a bureau also means that the cost of buying and maintaining specialist and expensive equipment is not borne by the project.

Both approaches have their merits but there are certain situations where the choices are more clear cut.

Using a bureau: Major reasons for sending materials to a bureau for digitisation rather than attempting to scan them in-house include that the originals are not capable of being scanned successfully in-house (for example the equipment is excessively expensive) or that the intended product is beyond the experience and abilities of the project – for example requiring advanced colour management skills. As an example, the type of equipment used for the scanning of items such as bound books or microfilms tends to be so expensive that it may be difficult for a project to justify the expenditure on such equipment, particularly given the short life-span and high maintenance costs of scanning equipment.

Other reasons for outsourcing may include where there is a large volume of work to be done in a short period of time or where the project has space, infrastructure or staffing constraints which preclude the setting up of in-house facilities.

In-house unit: Alternatively, the project manager may decide to use in-house resources for several reasons including that:

  • The collection cannot be moved out of the institution
  • The collection is badly organised (organising it well enough to send to an external supplier would be an excessive overhead)
  • The digitisation needs to be phased in small amounts over a long period
  • The digitisation task is very simple.

It may also be that the project can call on existing staff knowledge and equipment which would mean the project could be done in-house with limited further capital expenditure.

There are some baseline infrastructure requirements for in-house digitisation:

  • A robust production level scanner which will be able to scan the originals to a suitable resolution
  • A powerful PC with lots of memory (at least 256Mb RAM) – or Mac equivalent
  • Plenty of system resources such as backup and write to media (e.g. CDROM) capacity
  • Software to assist the digitisation
  • Experienced/competent staff to run the equipment and staff to oversee the process and quality assurance.

This is assuming that the in-house operation wants to approach anywhere near the unit prices of production available from outside agencies.

A further reason why many projects are undertaken in-house is that the staff time, overheads and some consumables such as file storage can often be swallowed up by the institution and do not become apparent as a costed factor of the project, thus making this appear to be a cheaper option than out-outsourcing.

There is no easy answer to the question of whether to scan in-house or to outsource because it depends so closely on the project team, the institution and the materials.

Choosing a scanning bureau

If the project decides to use an external body to digitise the materials then it is important to carefully look at the available service providers. A good place to find scanning bureaux is in the Cimtech Document Management Guide and Directory [15] which has up-to-date listings for UK suppliers of digitisation bureau services.

Among the most important things that you should ask potential suppliers are:

Can they conform to the nof-digitise Technical Standards and Guidelines?

  • Do not accept that the format they suggest instead will do just as well, get substantial benchmark samples done before you contract with them. Take third party advice if necessary.
  • Give the bureau the URL for the nof-digitise Technical Standards and Guidelines and a copy of this information paper.

Do they have safe storage facilities for the originals away from the production area?

  • If you require it do they have temperature controlled areas?

Will your originals be worked on by them or do they intend to contract to another supplier?

  • You need to know where your originals are.

Develop a tight specification of requirements and a contract that sets out what you expect from the vendor, including technical procedures, output formats, handling requirements and timescales. Insist that they will rework any data that fails your quality assurance procedures (i.e. that falls outside the requirements of the contract) without further cost.

Some suppliers will claim to be able to do any type of media that you ask them to tackle but HEDS’ experience is that bureau tend to specialise in certain types of conversion, for example high volume paper materials or high end colour image based work. Where this is the case they may not be as good at some media or they may outsource those media to a partner bureau. Insist on samples, undertake a vendor assessment or seek third party advice before contracting.

If the work is of such a volume that the job has to go out to tender, bear in mind that the cheapest quote may not be from the best bureau for the job. Ask each of the short-listed bureaux to undertake samples to your required specification and ask them to provide a detailed description of the processes used to achieve the output along with a price. You should then choose the supplier that provides the best value for money in terms of quality, price and the suitability of the conversion procedures.

Concluding remarks

A digitisation project can cover a wide range of complex activities and it is often easy to lose track of the underlying project aims and objectives. Digitisation is a tool and not a purpose and should always be used to facilitate the end result of the project rather than becoming the sole focus of it. It is hoped that this document will help to make the process of digitisation less fearsome and more tangible and therefore something to be harnessed to help to create useful and exciting digitisation projects.

References

[1] nof-digitise Technical Standards and Guidelines
http://www.peoplesnetwork.gov.uk/content/technical.asp
[2] Costing a digitisation project
http://heds.herts.ac.uk/resources/costing.html
[3] The HEDS matrix of potential cost factors
http://heds.herts.ac.uk/resources/matrix.html
[4] DigiNews: The costs of digital imaging projects
http://www.rlg.ac.uk/preserv/diginews/diginews3-5.html
[5] The Cedars project
http://www.leeds.ac.uk/cedars/
[6] Digital culture: maximising the nation's investment
http://www.bl.uk/services/preservation/digcult.html
[7] A feasibility study for the JISC Image Digitisation Initiative (JIDI)
http://heds.herts.ac.uk/resources/papers/jidi_fs.html
[8] TASI: The digital image
http://www.tasi.ac.uk/framework/capture/image.html
[9] The art and science of digital imaging
http://www.epi-centre.com/basics/basics2.html
[10] Digital Eyes: Scanners
http://www.image-acquire.com/scanner/
[11] RLG DigiNews: Review of colour management software
http://www.rlg.org/preserv/diginews/diginews3.html#hardware&software
[12] Imaging Resource: Listings of home use digital cameras
http://www.imaging-resource.com/
[13] EPIcentre: Listings of professional digital cameras
http://www.epi-centre.com/reports/reports.html
[14] TASI: Digital cameras
http://www.tasi.ac.uk/framework/capture/camera.html
[15] Cimtech document management guide and directory
http://www.cimtech.co.uk/cimtech/pub/p1.htm

Acknowledgements

This paper was commissioned from the Higher Education Digitisation Service (HEDS) by UKOLN on behalf of the New Opportunities Fund in association with the People's Network and is one of a series of Information Papers that will be produced by the NOF Technical Advisory Service.

Queries about the Information Papers should be addressed to:

Sally Criddle
UKOLN
The University of Bath
Bath BA2 7AY

Email: s.criddle@ukoln.ac.uk
Telephone: 01225 826250

UKOLN is funded by Resource: The Council for Museums, Archives & Libraries, the Joint Information Systems Committee (JISC) of the Higher and Further Education Funding Councils, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath where it is based.

 ukoln-logo.gif (1794 bytes) heds.gif (3936 bytes)  nof-logo.gif (2348 bytes)