STUDY TO IDENTIFY MEASURES NECESSARY FOR A SUCCESSFUL TRANSITION
TO A MORE ELECTRONIC FEDERAL DEPOSITORY LIBRARY PROGRAM (FDLP)
PRELIMINARY REPORT: TASK 8D: OTA WEB SITE
As part of the Study, a task force examined the issues that must
be addressed when an agency no longer makes electronic
information dissemination products and services available at its
Web site, and the site contains information that needs to be
remain available to the public through the FDLP and/or
transferred to the National Archives and Records Administration
(NARA). This task force was lead by Fynnette Eaton and Tom
brown, NARA Center for Electronic Records.
This preliminary report of the task force is being made available
for review and comment. Comments should be submitted by Friday,
February 16, 1996, by internet e-mail to study@gpo.gov, by fax to
FDLP Study at (202) 512-1262, or by mail to FDLP Study, Mail Stop
SDE, U.S. Government Printing Office, Washington, DC 20401.
*****************************************************************
TASK 8D: Identify issues that must be addressed when an agency
no longer makes electronic information dissemination
products and services available at its Web site, and
the site contains information that needs to be remain
available to the public through the Federal Depository
Library Program (FDLP) and/or transferred to the
National Archives and Records Administration (NARA).
BACKGROUND
The use of Web sites as a means to disseminate information is
becoming increasingly common among Government agencies. It is
also likely that agencies will begin to use their Web sites to
distribute information not available in any other format. These
Web sites are in essence forms of publication and therefore may
be Federal records as defined by 44 USC 3301. However, the ease
in which these sites can be established and modified creates
problems for both the Government Printing Office (GPO)and the
National Archives and Records Administration (NARA) which
share an interest in identifying and preserving the valuable
information on these Web sites.
GPO and NARA have dissimilar, but complementary, goals to assure
public access for the full life cycle of this information. GPO
must address measures that ensure continued short-term access (5
years minimum) for much of the information on the Web sites.
NARA focuses narrowly on that portion of the information which
has historic value and its goal is to assure long-term access
(indefinitely) to that information. Records schedules can serve
as a tool for identifying these sites, but GPO and NARA will have
to work together to create ways in which information can be
transferred without added burden to publishing agencies.
Issues concerning short and long-term access to information on
agency Web sites were brought to the forefront by the closing of
the Office of Technology Assessment (OTA) on September 29, 1995.
OTA's Web site, OTA Online, included a catalog of all the reports
produced by OTA from 1972 to 1995, ASCII text files of the 1994
reports, and ASCII as well as ADOBE Portable Document Format
(PDF) texts of the 1995 reports. The 1995 reports include some
reports that will not be formally published. OTA made
arrangements to mount information from OTA Online on GPO's
Web site. The final transfer to GPO will be sometime in February
1996. Since November 1, 1995, the OTA Web site also has been
mirrored by the National Academy of Sciences and the Woodrow
Wilson Public and International Affairs at Princeton University.
OTA also has a contract to scan the texts of all their reports
dating to 1972 to PDF format. The PDF files will be packaged
along with much of the information available via OTA Online, and
some additional historical material, on a set of five CD-ROMs.
The CD-ROM collection will be sold through GPO.
FEDERAL DEPOSITORY LIBRARY DISTRIBUTION
Most of the OTA information available in electronic format is
available in other formats through the FDLP. The only exceptions
are the reports and/or summaries that are still being completed
and will not be formally published.
DISSEMINATION ALTERNATIVES
Alternative A
GPO will mount the information from OTA on its own Web site for
depository library access. When available, both ASCII and PDF
files will be offered. The CD-ROM collection of OTA reports will
be distributed to depository libraries upon completion.
Benefits
Public access to the information is maintained through the
FDLP.
A variety of methods are available for accessing OTA
information.
More depository libraries are equipped to use CD-ROMs than
have Web access for the public.
Disadvantages/Problems
Some OTA information is distributed to depository libraries
in three different formats: paper, CD-ROM, and online through
the GPO Web site. This is not consistent with the Transition
Plan for the FDLP which proposes eliminating all dual
distribution.
GPO incurs additional costs for maintaining the information
on its Web site. OTA is responsible only for the costs
related to the initial mounting of the information.
Reports that have been scanned are not entirely searchable.
Although the reports will be scanned using Adobe Acrobat
Capture, which will convert them to machine readable form,
non-recognizable portions will be retained as images. In
addition, due to time constraints, the scanned reports will
not be reviewed.
PDF is software dependent and therefore not an acceptable
format for long term retention.
Alternative B
The OTA CD-ROMs would be distributed to depository libraries.
After a predetermined period of time, OTA information will be
removed from the GPO Web site.
Benefits
Public access to the information is maintained through the
FDLP.
More depository libraries are equipped to use CD-ROMs than
have Web access for the public.
Dual distribution in electronic format is eliminated.
Disadvantages/Problems
Scanned reports contain non-searchable portions and are not
reviewed.
The CD-ROMs cannot be archived because they use the PDF
software-dependent format. [See above.]
Public access to the reports is available only at depository
libraries, although as mentioned there are two other private
Web sites that will be providing this information for at
least a period of time.
ISSUES TO BE ADDRESSED (FDLP)
Archival Responsibilities
GPO will coordinate with NARA to transfer electronic
information which no longer warrants maintenance for the
FDLP to NARA. If GPO places agency data on a server and
makes it available via GPO Access, then the data becomes
part of GPO records and GPO will be responsible for its
disposition ( or transfer) to NARA. If an agency has
maintained electronic Government information and GPO points
to the information for the FDLP, it will be the legal
responsibility of the individual agency to transfer this
information to NARA.
GPO and NARA will need to determine whether statutory
changes are needed to clarify each agencies' respective
roles and responsibilities for extended access and
preservation of electronic information dissemination
products and services.
Life-Cycle of Electronic Information Dissemination Products and
Services
GPO and NARA will need to define a life-cycle for electronic
information dissemination products and services, beginning
with the original documents as an electronic file and ending
with its disposition. It is NARA's responsibility to
determine whether an electronic information dissemination
product warrants permanent retention or no longer warrants
continued preservation by the Government.
In accordance with the goal of providing extended access,
GPO will assume such costs as data preparation for mounting,
maintenance, storage, and ongoing costs to minimize
deterioration and assure technological currency.
Format Standards
GPO plans to receive electronic information provided by
agencies in any format. However, GPO needs to address the
prospect of determining a small number of "recommended
standard formats" for agency information, prior to receipt.
Also, GPO will need to develop standards for formats of data
that have been received and need to be mounted on GPO Access
for public availability. It is anticipated that certain
electronic source files provided to GPO by agencies will not
readily lend themselves to GPO Access in their original
formats. Steps may need to be taken to make information
received in these types of formats more suitable for
extended access.
GPO will offer this information to NARA once it is
determined that usage no longer warrants maintaining the
information at a GPO authorized site. This does not imply
that GPO will assume the responsibility of converting this
information for NARA if the file format used for extended
access through GPO Access is not suitable for the
preservation requirements of NARA. It is expected that GPO
may have electronic information for which usage no longer
warrants that will not be accepted by NARA because of file
formats. GPO and NARA must seek to coordinate their
efforts to assure that format standards used by GPO for
extended access to electronic information can be converted
easily to formats acceptable to NARA.
Software Dependent Information
Some electronic information dissemination products and
services produced by agencies in particular formats (such as
certain types of spreadsheet files) are embedded with file
structures that only have intrinsic value when used with
particular software. If this information is converted to
another generic format, such as ASCII, it loses value for
the user. This poses a concern for GPO, which will need to
make this information available via GPO Access, and NARA,
which currently will not accept electronic information that
is software dependent.
ARCHIVAL BACKGROUND
The OTA Web Site contains two main types of information. 1)
Organizational Structure and Members, and 2) Publications. The
organizational structure, lists of Technology Assessment Board
(TAB)and Technology Assessment Advisory Council (TAAC) members,
can be found in the annual reports of OTA, which are scheduled
for permanent retention under N1-444-94-1. Additional
information on the members' work with OTA is scheduled as
permanent in TAB/TAAC Member Files. The site also contains
information on ongoing projects (moot issue), how to contact
the staff, different online methods of obtaining publications,
and links to other government sites.
All of the information in the OTA Web Site has been scheduled in
a variety of different records covered by different items in the
schedule. However, the schedule does not directly apply to the
OTA Web Site. The OTA Web Site can be viewed as another
"publication" used by OTA to disseminate information. The
existence of the Web Site, as well as its content, provide
evidence of the image OTA wanted to portray to the public and the
work it accomplished. Even though the information exists, in
bits and pieces, among the records of OTA (records covered by the
schedule), by bringing this information together, and "packaging"
it in a different way, OTA has created a different record that is
not covered in the schedule. Thus, the OTA Web Site should be
scheduled as an item under the office that manages and maintains
the Web Site.
In FY 1995, the National Archives, Center for Electronic Records,
scheduled and appraised the ASCII text files of the 1994 and 1995
reports (N1-444-94-1). These ASCII files were appraised as
temporary because they do not contain the graphs, charts, and
photographs which are integral to the publication, thus
diminishing their value. At present, the Center for Electronic
Records will not accession files that are dependent on any
specific software package. This is referred to as software
dependence. This precludes the Center from accessioning the
reports produced using ADOBE software. For these reasons, NARA
has chosen to maintain the print formats of all the reports
produced by OTA. However, NARA will accession the ASCII text
file for the Catalog of Publications, 1972-1995 (N1-444-96-1).
This file is used to upload the Catalog unto the OTA Web Site. In
the case of OTA electronic information, NARA will accession only
the ASCII file used to create the Catalog of Publications,
1972-1995. Since OTA is able to send the file in the software
independent format specified in 36 CFR 1228.188, OTA will
transfer the file directly to NARA, Center for Electronic
Records.
NARA also will receive electronic versions of the OTA reports in
three different formats: ASCII, Hypertext Markup Language (HTML),
and PDF. These files will not be accessioned by the NARA, but
will be used to examine technical issues of the different
formats. However, NARA may retain for a limited time the HTML
and/or PDF format as an extra copy for convenience of reference.
HTML files are essentially ASCII files that contain text which
is "tagged" using a standardized language. HTML was created as a
standardized way to format documents so that they could be read
and interpreted by a variety different computer platforms. These
commands are written using ASCII characters. Any word processing
software package can be used to tag a document with HTML
commands.
However, there are software packages which were developed to
"markup" documents with HTML commands. If a tagged document is
printed out the HTML commands are visible along with the text of
the document. Therefore these files are software independent and
can be treated as ASCII files. If needed, PDF files also can be
converted to ASCII. Despite the fact that all these files are or
can be transferred into software independent files, the original
reports contain graphics, which cannot be software independent.
PDF files contain graphics and the HTML files contain links to
graphics. That is, the graphics "reside" elsewhere, not in the
tagged document.
APPRAISAL CONSIDERATIONS
What information is in the Home Page, and which files (and
addresses)does it link to? What is the structure/"hierarchy" of
the Site?
There is a distinction between a Home Page and a Web Site. A
Home Page is the first "page" of a site. It usually
contains an introduction or welcome statement. This Home
Page provides links to other pages. There are two main types
of links: a) links to other files (pages) in the same
location, and b) links to other Web sites. A Web Site can
be described as the sum of a Home Page and all the files
that are linked to it. It is important to determine which
file is the Home Page and trace how other pages are linked
to the Home Page and other pages. The structure of the page
can provide evidence as to what the agency feels its primary
mission is and how it wants to portray itself to the general
public.
Need to determine criteria/"draw lines" to limit the "links" that
will be appraised.
In appraising a Site it is necessary to examine the Home
Page and the files that are linked. However, the links to
other sites should be appraised with the records of the
agencies that maintain those sites. If there is a link to a
site which maintains information for the site being
appraised, and the agency (of the records being appraised)
is responsible for the content, then that particular link
should be considered for appraisal. This does not mean that
a whole new site is to be appraised along with the first
site. A precedent for this can be found in N1-149-95-1P,
Item 20.8, VAX Client Server, memo from NSXA to NIR dated
January 9, 1995 "[Electronic Photocomposition Division
(EPD)]uploads the publications, which they receive on tape
or disk. EPD is not responsible for the creation or content
of the publications. The individual agencies that send the
publications to be are uploaded into the system are
responsible for all the data and information. For these
reasons, the files in the VAX Client Server should not be
appraised as GPO records..."
Which files within a site should be accessioned? Do all the files
need to be brought in? Is it adequate to simply document that a
particular link contained certain information which can be
obtained among the other records of the agency? If links to
other sites, document the name and agency which maintained the
site?
The determination of specific files in a Web Site that
should be accessioned and which links should be documented
or appraised must be done on a case by case basis.
APPRAISAL ALTERNATIVES
Alternative A
Accession the records of the persons or committees responsible
for maintaining the Web Site. The records of these persons or
committees should reflect the content and structure of the site.
In fact, these files serve as documentation of the electronic
files posted on the Web Site. Thus, the information that appeared
on the Web Site could be reconstructed. In this case, we would
be documenting the existence of a Web Site without actually
accessioning the information on the Web Site.
Benefits
This approach avoids the duplication of information NARA
would be accessioning. The information provided by the
persons or committees in charge of the Site, would provide
researchers with evidence of the information which was
posted and they would then search out the desired documents
from the records of that agency. This would be especially
true of larger agencies which strictly control the
information on their Web sites.
Disadvantages/Problems
Not all agencies have a centralized place where this
information can be found. In smaller agencies, the Web
sites might be constructed and maintained by interns or
interested personnel, yet their records may not provide
adequate information on the content and structure of the Web
Site.
This option also ignores the possibility that in the future,
the information posted on the Web site might not appear in
any other format. In these cases, it is necessary not only
to appraise the records of those maintaining the files, but
the files on the Web site itself.
Alternative B
Accession all the files within the Web Site. These could be
viewed through a browser. However, it is important to note that
different browsers servers will "interpret" the HTML commands
differently. Also, most Web sites contain links to graphics and
other sites, therefore those links or graphics would not be
functional. In this case, the links can be documented by
identifying the institution maintaining that site and providing a
brief description of the content of those sites.
Benefits
The Web site can be preserved in a fashion in which
researchers will be able to "navigate" though. Researchers
would also get a better idea of the structure of the site.
Disadvantages/Problems
At the moment graphics cannot be preserved, an integral part
of most Web sites.
The sheer size of some Web sites and the number of links
that must be accounted for make them difficult to document.
The possibility exists for duplicating information that
already exists among the records of the agency.
Alternative C
Accession selected files from the Web Site, as well as preserving
the records of the persons, offices, or committees maintaining
the Site. Valuable files, which may not exist in any other
format, or are more valuable in electronic format can be
preserved. These files could be either requested from the agency
without HTML markup (in plain ASCII)or the NARA could maintain
the markup.
Benefits
This approach ensures the preservation of unique files or
valuable information without the burden of accessioning the
whole site.
Disadvantages/Problems
In accessioning select files, it is important to document
the context. The documentation package would include
technical information, but also information of the content
of the site were the selected file was originally placed.
Web sites are always changing. Files can easily be added,
updated, and deleted. This poses a problem for accessioning
files in a Web site. The solution proposed in the "Preserving
Digital Information: Draft Report of the Task Force on Archiving
of Digital Information" (August 24, 1995) is to take "periodic
snapshots" of the pages in a site. Ultimately, the agency is
responsible for scheduling the files in their Web site. NARA can
work with the agency to develop a strategy for accessioning files
which constantly are being changed.
ISSUES TO BE ADDRESSED
Identifying Information for Preservation
How can Web sites with valuable information be identified?
Federal agencies are creating a large number of Web sites.
Once agencies are no longer interested in maintaining that
information, there is no mechanism in place to preserve that
information for future users. Both GPO and NARA share an
interest in preserving this information for future use.
However, as Federal records, the Web sites must be scheduled
along with other agency records. Therefore, records
schedules could serve as a tool to identify valuable
Government information on Web sites.
Transfer of Information to GPO and NARA
Once identified, what information from the Web sites should
be transferred? As explained earlier, GPO and NARA have
different goals. Each agency will have to decide what
information on the Web sites will be of value to their
customers. Sometimes both agencies will be interested in the
same information. However, GPO is primarily interested in
providing information for short-term access. Since NARA is
interested in maintaining indefinitely information with
historic value, it needs to apply criteria for determining
which information from Web sites warrants continued
preservation by the Government. How should this information
be transferred to GPO and/or NARA without added burden to
the agencies? GPO and NARA will have to work together to
identify ways in which agencies can transfer the information
without an added burden.
Extended FDLP Access to Electronic Information Dissemination
Products and Services
What is the most cost-effective and useful method for
preserving FDLP access to electronic Government information
available from agency Web sites or online services? The
maintenance and migration of electronic information over a
period of years can be very costly. If information already
has been distributed in paper, microfiche or CD-ROM does it
make sense to provide continued online access to the
information? If an agency decides to discontinue access to
information through their Web site, does GPO have a
responsibility to obtain the information and provide funds
and resources for its continued access through the FDLP?
Differences Between the Life-Cycle of Information Dissemination
Products and Services in Electronic vs. Traditional Formats
How is the lifecycle for electronic information different
from that of traditional formats like paper and microfiche?
What part of the information dissemination process must be
changed in order to ensure extended access and the
archivability of information on agency Web sites?
*****************************************************************
Judy Russell <jrussell@gpo.gov>
Comments should be submitted by Friday, February 16, 1996, by
internet e-mail to study@gpo.gov, by fax to FDLP Study at (202)
512-1262, or by mail to FDLP Study, Mail Stop SDE, U.S.
Government Printing Office, Washington, DC 20401.
This archive was generated by hypermail 2b29 : Wed Nov 14 2007 - 20:49:10 PST