Tuesday, 12 March 2013

World class digitisation in Sweden

I recently attended the Digidaily newspaper project review at the National Archives department for digitization Media Conversion Center (MKC) in Fränsta, Sweden. I describe here some of the secrets of the success I observed and also my review and recommendations for the future directions suggested by this ground breaking project.

It is a privilege of my working life that I get to see behind the scenes at many great memory organisations the world over, both as an academic and as an advisor. This privilege is made even more pleasurable when the people you are working with are highly professional, smart and welcoming.

Such was the experience of being an external expert on the 6-7 February 2013 final seminar and review at MKC in Fränsta for the Digidaily project. Fränsta is not the easiest place to visit with 2 flights and a 1 hour car journey but it is certainly worth the visit if you want to see one of the world's finest digitisation production houses.

The folks at MKC (who did the digitisation) and their colleagues in the National Library of Sweden (KB) are wonderful hosts. They also have the unique ability to listen intently to all advice given and never once become defensive if there is a negative comment or constructive criticism. They just absorb it all, ask smart questions so they understand what is needed and then get on with making their processes better. This is something I rarely see and is a culture inculcated by the management at both MKC and the KB that is to be applauded. These folks really know how to collaborate.

Investigating newspaper segmentation
Among the invited guests was an old friend and fellow invited external expert, Edwin Klijn (The NIOD Institute for War, Holocaust and Genocide Studies and the former project manager at Koninklijke Bibliotheek, the Netherlands). Edwin is always a pleasure to work with and his experience has been absolutely essential to us both being able to give a well rounded and supportive assessment for Digidaily. It was also great to meet colleagues from Norwegian and Finnish newspaper digitization projects and to meet with the Swedish National Librarian, Gunilla Herdenberg, the National Archivist, Bjorn Jordell, members of Digisam and also Isobel Hadley Kamptz from the Swedish Digitizing Commission.

Getting Right to the Point

My assessment of the Digidaily project is that the production house, the quality assurance, the digitisation workflow, the very high throughput and the balance of costs against productivity to deliver such a high standard of colour imaging of newspapers is truly world class and amongst the best I have ever witnessed.
The levels of achievement that I saw in this project review astounded me - the teams at the KB and MKC have exceeded even my high expectations for their excellence. I know I am gushing here, so let me tell you why this is impressive and also that there is inevitably more to do.



The Swedish Secrets of Success

1. Culture. They have the right culture for success. It is built upon sharing ideas, co-operation, listening and working very hard to be the best. As is often said get the right culture and good performance will follow and from there great results. MKC and KB have one of the best working environments for digitisation I have seen - they have copious space for layout and preparation of the originals, they have an excellent quality assurance suite, they have great ergonomics on the scanning equipment. But most of all there is an attitude of respectful team-working, of pursuing best practice, of working hard at augmenting their processes and workflows and this underpins their success.

2. Costs. They know exactly how much everything costs - there is not a process or a step along the way that MKC and KB have not accounted for and have not placed a cost against. This transparency has allowed them to focus on the right areas where cost savings can be achieved through efficiency gains and to also support investing in those areas where the value is most apparent. I have always been impressed with the high volume throughput achieved at MKC despite the very fragile materials they are working with (on average this about 90-100,000 images per day). Usually one would assume working with fragile and hard to handle originals such as newspapers (see images below) would push the costs up and certainly the preparation costs are higher because of these originals. But the Digidaily project has optimised its workflows to such an extent that they really surprised me with their price per page. The costs are incredibly low for the quality and standards being achieved.

Varied types and quality of newspaper




Superb preparation facilities at MKC



Varied scanners in use to ensure fastest workflow for
each type of newspaper
Newspaper segmentation.
The different coloured segments show the various articles in the newspaper.
3. Coping with change. The Digidaily project has had to cope with many changes, including: adding segmentation on the page and article level to their workflow, dealing with image quality issues (unwanted artefacts and wavy text), using JPEG2000 as the archival image format and storing everything in colour. Mass digitisation relies on low variation and little change to keep costs down and productivity up. However, the Digidaily took all these specification changes or technical challenges in their stride and still pushed down costs. They used their excellent workflow system to help increase quality by spotting an artefact caused by dust at the point of scanning rather than at image review thus reducing rework considerably. By being flexible but always systematising their decisions they have been able to review, renew and change their working practices to meet new requirements.

4. The 7 wastes mantra. I hope I am not giving away any State secrets at this point but one of the key elements to success was ensuring that they always kept an eye on what they termed the "7 wastes":
  • overproduction, 
  • inventory, 
  • waiting, 
  • motion, 
  • transportation, 
  • rework and 
  • over processing

Future Directions for Swedish Newspapers

It all sounds fabulous, digitising millions of pages in full colour to amazing specifications at a very low cost per page. However, there are a few things missing right now and there are future considerations that are worthy of comment. As a digitisation project and process it is world class, but there are aspects of the activity that have a ways to go to reach the same heady heights.

Access: Right now there isn't a user interface for the Digidaily that is worthy of the original materials or the images and metadata that have been created. There is a huge opportunity here for the KB, to showcase their content to its best effect and to make a wonderful user experience by engaging with the detailed metadata and segmented articles in the digitised newspapers. I pointed to the National Library of Wales newspaper programme which will, in all honesty, not have such sumptuous images as Digidaily, but has an excellent interface that really supports the user and will deliver an open API so that others can also build upon the content. An important next step for the KB and Digidaily is to think about how they want to present their content to their audience.

Segmentation: this is relatively new area that MKC are still getting to grips with in terms of quality assurance and workflow. It is running very efficiently at present but there are some niggles still to be addressed and I am sure MKC will get to grips with these.

Programme not projects: I am recommending also that the KB needs to commit to a longer term programme of digitisation. The cost savings and efficiencies gained in Digidaily will soon be lost if they have to stop, then start, then stop again due to a project led means of funding and managing the digitisation. If the KB can drive towards a programme of work then not only can current efficiencies be secured but I believe that new efficiencies will be achieved as well.

Effectiveness: At times the relationship between MKC and KB has focussed on efficiency and they have worked tirelessly to achieve the most efficient ways of achieving excellent image results. One area I would like to see them give more focus to in the future is the issue of effectiveness. Being effective is about ensuring that those efficient workflows are delivering the most valuable outcome possible. At present the KB have gained some information about their user community and what success looks like for those communities. If they can work more on the desired impacts and evaluate their communities to an even more detailed degree they will be able to deliver the most effective outcomes.

Copyright: I could rant about how daft the IPR/copyright attitude is in Sweden where they basically can't digitise beyond 1862 (YES 1862) without fear of breaching copyright on the basis that someone could live to 90 and then you have to add 70 years on from there. Pretty much everyone else in Europe is finding the 1910's a nice stopping point for copyright works... It's a bonkers situation that Swedish memory institutions find themselves in, but that's probably not something the KB can easily change.


Thank you's

I'd like to thank everyone from MKC and KB for their openness and for just being clever, hard working and passionate about what they do. This independent blog post is my own opinion and thoughts throughout but was written with their permission and I am glad they have let me share some of their "secrets". If you want to know more (and there is more) then get in touch with them and find out from a world leading team how it should be done.

I'd specifically like to thank the following for their support and friendship:-
Mikael Andersson, Stina Degerstedt, Annelie Eriksen, Anna-Karin Garnes, Joar Hedtjärn, Klas Jadeglans, Torsten Johansson, Daniel Jonsson, Maud Rahmqvist, Heidi Rosen and Anders Udd.

7 comments:

  1. Impressive stuff!

    ReplyDelete
  2. Omitting access from the project plan is the major disaster. When the data is finally released somewhere in 2023 or 2033, or maybe 2043, who knows, it might turn out to contain errors that could have been fixed in an early stage, but then it will be too late.

    ReplyDelete
  3. Great report! Do you know of any documentation on the various "stopping points" other organizations have decided on across europe?

    ReplyDelete
  4. Access to the material is planned for Q2 2014. Yes, it is much later than most of us at the National Library of Sweden have wished for. The upside of the delay is that the library finally will be building the interface within the framework of the national union catalogue, which promises a much more stable and long term solution and also an interface not just for newspapers but for all digital material (digitized and born digital) that the library will receive through e-legal deposit and produce in-house or in co-operation with other institutions.

    ReplyDelete
  5. Great, thanks for the good insight here, keep posting more

    ReplyDelete
  6. They have quite the setup there. Digitization still remains a labor-intensive project for the most part, and coming from someone who had undergone the process, I salute you. I’m not really concerned with the delayed release of the digitized information, as it really takes time to finish everything and properly catalog and lay out the metadata for each and every paper that was scanned, down to the last article, if they’re that meticulous (and from your post, they seem to be).

    Ruby Badcoe

    ReplyDelete
  7. Hi Simon - Nice report! It took some time for me to see it though. Anyway, since they are using our software (Zissor) for the Automatic segmentation and OCR, is it OK for you if I link to this page from our home page (www.zissor.com) ?

    They are really professional at MKC - and for us it is a pleasure to work with them. I also like their idea of focusing on automation and volume before a lot of manual steps in the post-processing.

    ReplyDelete