Friday, September 17, 2010

Problems with Pandora

For the benefit of readers who do not know what Pandora is, it is Australia's National Library archive of internet material, including blogs.

Some time ago, I emailed them several times suggesting that some of my blogs might be included, but without response. I accept that the things I write about are either not of national significance or, perhaps, not sufficiently representative to warrant inclusion. However, this did get me thinking.

As someone who monitors the blogosphere on a daily basis, it must be very hard for librarians and/or archivists sitting there in Canberra to form a view, to decide what to include, from a world that constantly shifts and is so ephemeral.

Some of the group blogs like Club Troppo make fairly logical candidates. Even then, there can be an issue as to how often to update. The practice seems to be to update on an annual basis.

Pandora does have guidelines as to what night be included. These state in part:
PANDORA is a selective archive. The National Library and its partners do not attempt to collect all Australian online publications and web sites, but select those that they consider are of significance and to have long-term research value.
The problem, as I see it, lies in determining what is significant, what has longer term research value. After all, this can only be determined later. If we look at the national Library's own selection criteria (other participating agencies have their own criteria), we find the following priority definitions:
Archiving will focus on the following categories:
(i) Commonwealth and ACT government publications (State government publications will be left to the State libraries)
(ii) Publications of tertiary education institutions
(iii) Conference proceedings
(iv) E-journals
(v) Items referred by indexing and abstracting agencies (which frequently are from the first three categories but also include items with print versions)
(vi) Topical sites:

(a) Sites in nominated subject areas (see Appendix 2) that would be collected on a rolling three year basis; and
(b) Sites documenting key issues of current social or political interest, such as election sites, Sydney Olympics, Bali bombing
More specific selection guidelines for each of these categories are detailed in Section 5.
3.6 This approach will not preclude us from collecting any site of a high standard and long-term research value, regardless of subject, format, or publication type. But we will give priority to the categories listed above and subjects detailed in Section 5. This means that some categories currently being collected will not be given priority but will be collected only as resources allow.

We also find the following exclusions:

The following categories will generally not be collected, though exceptions may be made.
  • Cams (web sites employing a web camera that uploads digital images for broadcast)
  • Datasets (5)
  • Discussion lists, chat rooms, bulletin boards and news groups
  • Drafts and works in progress, even if they otherwise meet the selection guidelines
  • Games
  • Individual articles and papers
  • News sites
  • Online daily newspapers for which print versions exist
  • Organisational records
  • Portals and other sites that serve the sole purpose of organising Internet information
  • Promotional sites and advertising
  • Sites that are compilations of information from other sources and are not original in content
  • Theses (the responsibility of universities and the Australian Digital Theses Project) .
Now I'm simply not sure of this categorisation. I'm not saying it's wrong, simply that I'm not sure.
Let me take an example. News sites are not included. Who, then, is responsible for archiving these? Increasingly, the on-line sites include comments sections that actually (and I am speaking wearing my historian's hat) provide a useful snapshot of views. You obviously don't get these in the archived print version.

A second example: how does one define long term research value?

Research interests vary. The things that I am interested in, for example, were quite popular forty years ago and may now even be coming back into partial vogue. In the meantime, few people were interested. I doubt that any of the Pandora crew would pick the things that I write about in their long term category.
So all this leads me to a question to my fellow bloggers. What do you think that Pandora should preserve?

