FDLP Study: Task 8D: OTA Web Site

From: Judith C. Russell (jrussell@gpo.gov)
Date: Fri Feb 02 1996 - 12:56:07 PST


STUDY TO IDENTIFY MEASURES NECESSARY FOR A SUCCESSFUL TRANSITION
TO A MORE ELECTRONIC FEDERAL DEPOSITORY LIBRARY PROGRAM (FDLP)

PRELIMINARY REPORT: TASK 8D: OTA WEB SITE

As part of the Study, a task force examined the issues that must
be addressed when an agency no longer makes electronic
information dissemination products and services available at its
Web site, and the site contains information that needs to be
remain available to the public through the FDLP and/or
transferred to the National Archives and Records Administration
(NARA). This task force was lead by Fynnette Eaton and Tom
brown, NARA Center for Electronic Records.

This preliminary report of the task force is being made available
for review and comment. Comments should be submitted by Friday,
February 16, 1996, by internet e-mail to study@gpo.gov, by fax to
FDLP Study at (202) 512-1262, or by mail to FDLP Study, Mail Stop
SDE, U.S. Government Printing Office, Washington, DC 20401.

*****************************************************************

TASK 8D: Identify issues that must be addressed when an agency
          no longer makes electronic information dissemination
          products and services available at its Web site, and
          the site contains information that needs to be remain
          available to the public through the Federal Depository
          Library Program (FDLP) and/or transferred to the
          National Archives and Records Administration (NARA).

BACKGROUND

The use of Web sites as a means to disseminate information is
becoming increasingly common among Government agencies. It is
also likely that agencies will begin to use their Web sites to
distribute information not available in any other format. These
Web sites are in essence forms of publication and therefore may
be Federal records as defined by 44 USC 3301. However, the ease
in which these sites can be established and modified creates
problems for both the Government Printing Office (GPO)and the
National Archives and Records Administration (NARA) which
share an interest in identifying and preserving the valuable
information on these Web sites.

GPO and NARA have dissimilar, but complementary, goals to assure
public access for the full life cycle of this information. GPO
must address measures that ensure continued short-term access (5
years minimum) for much of the information on the Web sites.
NARA focuses narrowly on that portion of the information which
has historic value and its goal is to assure long-term access
(indefinitely) to that information. Records schedules can serve
as a tool for identifying these sites, but GPO and NARA will have
to work together to create ways in which information can be
transferred without added burden to publishing agencies.

Issues concerning short and long-term access to information on
agency Web sites were brought to the forefront by the closing of
the Office of Technology Assessment (OTA) on September 29, 1995.
OTA's Web site, OTA Online, included a catalog of all the reports
produced by OTA from 1972 to 1995, ASCII text files of the 1994
reports, and ASCII as well as ADOBE Portable Document Format
(PDF) texts of the 1995 reports. The 1995 reports include some
reports that will not be formally published. OTA made
arrangements to mount information from OTA Online on GPO's
Web site. The final transfer to GPO will be sometime in February
1996. Since November 1, 1995, the OTA Web site also has been
mirrored by the National Academy of Sciences and the Woodrow
Wilson Public and International Affairs at Princeton University.

  
OTA also has a contract to scan the texts of all their reports
dating to 1972 to PDF format. The PDF files will be packaged
along with much of the information available via OTA Online, and
some additional historical material, on a set of five CD-ROMs.
The CD-ROM collection will be sold through GPO.

FEDERAL DEPOSITORY LIBRARY DISTRIBUTION

Most of the OTA information available in electronic format is
available in other formats through the FDLP. The only exceptions
are the reports and/or summaries that are still being completed
and will not be formally published.

DISSEMINATION ALTERNATIVES

Alternative A

GPO will mount the information from OTA on its own Web site for
depository library access. When available, both ASCII and PDF
files will be offered. The CD-ROM collection of OTA reports will
be distributed to depository libraries upon completion.

Benefits

  Public access to the information is maintained through the
    FDLP.

  A variety of methods are available for accessing OTA
    information.

  More depository libraries are equipped to use CD-ROMs than
    have Web access for the public.

Disadvantages/Problems

  Some OTA information is distributed to depository libraries
    in three different formats: paper, CD-ROM, and online through
    the GPO Web site. This is not consistent with the Transition
    Plan for the FDLP which proposes eliminating all dual

    distribution.

  GPO incurs additional costs for maintaining the information
    on its Web site. OTA is responsible only for the costs
    related to the initial mounting of the information.

  Reports that have been scanned are not entirely searchable.
    Although the reports will be scanned using Adobe Acrobat
    Capture, which will convert them to machine readable form,
    non-recognizable portions will be retained as images. In
    addition, due to time constraints, the scanned reports will
    not be reviewed.

  PDF is software dependent and therefore not an acceptable
    format for long term retention.

Alternative B

The OTA CD-ROMs would be distributed to depository libraries.
After a predetermined period of time, OTA information will be
removed from the GPO Web site.

Benefits

  Public access to the information is maintained through the
    FDLP.

  More depository libraries are equipped to use CD-ROMs than
    have Web access for the public.

  Dual distribution in electronic format is eliminated.

Disadvantages/Problems

  Scanned reports contain non-searchable portions and are not
    reviewed.

  The CD-ROMs cannot be archived because they use the PDF
    software-dependent format. [See above.]

  Public access to the reports is available only at depository
    libraries, although as mentioned there are two other private
    Web sites that will be providing this information for at
    least a period of time.

ISSUES TO BE ADDRESSED (FDLP)

Archival Responsibilities

     GPO will coordinate with NARA to transfer electronic
     information which no longer warrants maintenance for the
     FDLP to NARA. If GPO places agency data on a server and
     makes it available via GPO Access, then the data becomes
     part of GPO records and GPO will be responsible for its
     disposition ( or transfer) to NARA. If an agency has
     maintained electronic Government information and GPO points
     to the information for the FDLP, it will be the legal
     responsibility of the individual agency to transfer this
     information to NARA.

     GPO and NARA will need to determine whether statutory
     changes are needed to clarify each agencies' respective
     roles and responsibilities for extended access and
     preservation of electronic information dissemination
     products and services.

Life-Cycle of Electronic Information Dissemination Products and
Services

     GPO and NARA will need to define a life-cycle for electronic
     information dissemination products and services, beginning
     with the original documents as an electronic file and ending
     with its disposition. It is NARA's responsibility to
     determine whether an electronic information dissemination
     product warrants permanent retention or no longer warrants
     continued preservation by the Government.

     In accordance with the goal of providing extended access,
     GPO will assume such costs as data preparation for mounting,
     maintenance, storage, and ongoing costs to minimize
     deterioration and assure technological currency.

Format Standards

     GPO plans to receive electronic information provided by
     agencies in any format. However, GPO needs to address the
     prospect of determining a small number of "recommended
     standard formats" for agency information, prior to receipt.
     Also, GPO will need to develop standards for formats of data
     that have been received and need to be mounted on GPO Access
     for public availability. It is anticipated that certain
     electronic source files provided to GPO by agencies will not
     readily lend themselves to GPO Access in their original
     formats. Steps may need to be taken to make information
     received in these types of formats more suitable for
     extended access.

     GPO will offer this information to NARA once it is
     determined that usage no longer warrants maintaining the
     information at a GPO authorized site. This does not imply
     that GPO will assume the responsibility of converting this
     information for NARA if the file format used for extended
     access through GPO Access is not suitable for the
     preservation requirements of NARA. It is expected that GPO
     may have electronic information for which usage no longer
     warrants that will not be accepted by NARA because of file
     formats. GPO and NARA must seek to coordinate their
     efforts to assure that format standards used by GPO for
     extended access to electronic information can be converted
     easily to formats acceptable to NARA.

Software Dependent Information

     Some electronic information dissemination products and
     services produced by agencies in particular formats (such as
     certain types of spreadsheet files) are embedded with file
     structures that only have intrinsic value when used with
     particular software. If this information is converted to
     another generic format, such as ASCII, it loses value for
     the user. This poses a concern for GPO, which will need to
     make this information available via GPO Access, and NARA,
     which currently will not accept electronic information that
     is software dependent.

ARCHIVAL BACKGROUND

The OTA Web Site contains two main types of information. 1)
Organizational Structure and Members, and 2) Publications. The
organizational structure, lists of Technology Assessment Board
(TAB)and Technology Assessment Advisory Council (TAAC) members,
can be found in the annual reports of OTA, which are scheduled
for permanent retention under N1-444-94-1. Additional
information on the members' work with OTA is scheduled as
permanent in TAB/TAAC Member Files. The site also contains
information on ongoing projects (moot issue), how to contact
the staff, different online methods of obtaining publications,
and links to other government sites.

All of the information in the OTA Web Site has been scheduled in
a variety of different records covered by different items in the
schedule. However, the schedule does not directly apply to the
OTA Web Site. The OTA Web Site can be viewed as another
"publication" used by OTA to disseminate information. The
existence of the Web Site, as well as its content, provide
evidence of the image OTA wanted to portray to the public and the
work it accomplished. Even though the information exists, in
bits and pieces, among the records of OTA (records covered by the
schedule), by bringing this information together, and "packaging"
it in a different way, OTA has created a different record that is
not covered in the schedule. Thus, the OTA Web Site should be
scheduled as an item under the office that manages and maintains
the Web Site.

In FY 1995, the National Archives, Center for Electronic Records,
scheduled and appraised the ASCII text files of the 1994 and 1995
reports (N1-444-94-1). These ASCII files were appraised as
temporary because they do not contain the graphs, charts, and
photographs which are integral to the publication, thus
diminishing their value. At present, the Center for Electronic
Records will not accession files that are dependent on any
specific software package. This is referred to as software
dependence. This precludes the Center from accessioning the
reports produced using ADOBE software. For these reasons, NARA
has chosen to maintain the print formats of all the reports
produced by OTA. However, NARA will accession the ASCII text
file for the Catalog of Publications, 1972-1995 (N1-444-96-1).
This file is used to upload the Catalog unto the OTA Web Site. In
the case of OTA electronic information, NARA will accession only
the ASCII file used to create the Catalog of Publications,
1972-1995. Since OTA is able to send the file in the software
independent format specified in 36 CFR 1228.188, OTA will
transfer the file directly to NARA, Center for Electronic
Records.

NARA also will receive electronic versions of the OTA reports in
three different formats: ASCII, Hypertext Markup Language (HTML),
and PDF. These files will not be accessioned by the NARA, but
will be used to examine technical issues of the different
formats. However, NARA may retain for a limited time the HTML
and/or PDF format as an extra copy for convenience of reference.
HTML files are essentially ASCII files that contain text which
is "tagged" using a standardized language. HTML was created as a
standardized way to format documents so that they could be read
and interpreted by a variety different computer platforms. These
commands are written using ASCII characters. Any word processing
software package can be used to tag a document with HTML
commands.
However, there are software packages which were developed to
"markup" documents with HTML commands. If a tagged document is
printed out the HTML commands are visible along with the text of
the document. Therefore these files are software independent and
can be treated as ASCII files. If needed, PDF files also can be
converted to ASCII. Despite the fact that all these files are or
can be transferred into software independent files, the original
reports contain graphics, which cannot be software independent.
PDF files contain graphics and the HTML files contain links to
graphics. That is, the graphics "reside" elsewhere, not in the
tagged document.

APPRAISAL CONSIDERATIONS

What information is in the Home Page, and which files (and
addresses)does it link to? What is the structure/"hierarchy" of
the Site?

     There is a distinction between a Home Page and a Web Site. A
     Home Page is the first "page" of a site. It usually
     contains an introduction or welcome statement. This Home
     Page provides links to other pages. There are two main types
     of links: a) links to other files (pages) in the same
     location, and b) links to other Web sites. A Web Site can
     be described as the sum of a Home Page and all the files
     that are linked to it. It is important to determine which
     file is the Home Page and trace how other pages are linked
     to the Home Page and other pages. The structure of the page
     can provide evidence as to what the agency feels its primary
     mission is and how it wants to portray itself to the general
     public.

Need to determine criteria/"draw lines" to limit the "links" that
will be appraised.

     In appraising a Site it is necessary to examine the Home
     Page and the files that are linked. However, the links to
     other sites should be appraised with the records of the
     agencies that maintain those sites. If there is a link to a
     site which maintains information for the site being
     appraised, and the agency (of the records being appraised)
     is responsible for the content, then that particular link
     should be considered for appraisal. This does not mean that
     a whole new site is to be appraised along with the first
     site. A precedent for this can be found in N1-149-95-1P,
     Item 20.8, VAX Client Server, memo from NSXA to NIR dated
     January 9, 1995 "[Electronic Photocomposition Division
     (EPD)]uploads the publications, which they receive on tape
     or disk. EPD is not responsible for the creation or content
     of the publications. The individual agencies that send the
     publications to be are uploaded into the system are
     responsible for all the data and information. For these
     reasons, the files in the VAX Client Server should not be
     appraised as GPO records..."

Which files within a site should be accessioned? Do all the files
need to be brought in? Is it adequate to simply document that a
particular link contained certain information which can be
obtained among the other records of the agency? If links to
other sites, document the name and agency which maintained the
site?

     The determination of specific files in a Web Site that
     should be accessioned and which links should be documented
     or appraised must be done on a case by case basis.

APPRAISAL ALTERNATIVES

Alternative A

Accession the records of the persons or committees responsible
for maintaining the Web Site. The records of these persons or
committees should reflect the content and structure of the site.
In fact, these files serve as documentation of the electronic
files posted on the Web Site. Thus, the information that appeared
on the Web Site could be reconstructed. In this case, we would
be documenting the existence of a Web Site without actually
accessioning the information on the Web Site.

Benefits

   This approach avoids the duplication of information NARA
     would be accessioning. The information provided by the
     persons or committees in charge of the Site, would provide
     researchers with evidence of the information which was
     posted and they would then search out the desired documents
     from the records of that agency. This would be especially
     true of larger agencies which strictly control the
     information on their Web sites.

Disadvantages/Problems

   Not all agencies have a centralized place where this
     information can be found. In smaller agencies, the Web
     sites might be constructed and maintained by interns or
     interested personnel, yet their records may not provide
     adequate information on the content and structure of the Web
     Site.

   This option also ignores the possibility that in the future,
     the information posted on the Web site might not appear in
     any other format. In these cases, it is necessary not only
     to appraise the records of those maintaining the files, but
     the files on the Web site itself.

Alternative B

Accession all the files within the Web Site. These could be
viewed through a browser. However, it is important to note that
different browsers servers will "interpret" the HTML commands
differently. Also, most Web sites contain links to graphics and
other sites, therefore those links or graphics would not be
functional. In this case, the links can be documented by
identifying the institution maintaining that site and providing a
brief description of the content of those sites.

Benefits

   The Web site can be preserved in a fashion in which
     researchers will be able to "navigate" though. Researchers
     would also get a better idea of the structure of the site.

Disadvantages/Problems

   At the moment graphics cannot be preserved, an integral part
     of most Web sites.

   The sheer size of some Web sites and the number of links
     that must be accounted for make them difficult to document.

   The possibility exists for duplicating information that
     already exists among the records of the agency.

Alternative C

Accession selected files from the Web Site, as well as preserving
the records of the persons, offices, or committees maintaining
the Site. Valuable files, which may not exist in any other
format, or are more valuable in electronic format can be
preserved. These files could be either requested from the agency
without HTML markup (in plain ASCII)or the NARA could maintain
the markup.

Benefits

   This approach ensures the preservation of unique files or
     valuable information without the burden of accessioning the
     whole site.

Disadvantages/Problems

   In accessioning select files, it is important to document
     the context. The documentation package would include
     technical information, but also information of the content
     of the site were the selected file was originally placed.

Web sites are always changing. Files can easily be added,
updated, and deleted. This poses a problem for accessioning
files in a Web site. The solution proposed in the "Preserving
Digital Information: Draft Report of the Task Force on Archiving
of Digital Information" (August 24, 1995) is to take "periodic
snapshots" of the pages in a site. Ultimately, the agency is
responsible for scheduling the files in their Web site. NARA can
work with the agency to develop a strategy for accessioning files
which constantly are being changed.

ISSUES TO BE ADDRESSED

Identifying Information for Preservation

     How can Web sites with valuable information be identified?
     Federal agencies are creating a large number of Web sites.
     Once agencies are no longer interested in maintaining that
     information, there is no mechanism in place to preserve that
     information for future users. Both GPO and NARA share an
     interest in preserving this information for future use.
     However, as Federal records, the Web sites must be scheduled
     along with other agency records. Therefore, records
     schedules could serve as a tool to identify valuable
     Government information on Web sites.

Transfer of Information to GPO and NARA

     Once identified, what information from the Web sites should
     be transferred? As explained earlier, GPO and NARA have
     different goals. Each agency will have to decide what
     information on the Web sites will be of value to their
     customers. Sometimes both agencies will be interested in the
     same information. However, GPO is primarily interested in
     providing information for short-term access. Since NARA is
     interested in maintaining indefinitely information with
     historic value, it needs to apply criteria for determining
     which information from Web sites warrants continued
     preservation by the Government. How should this information
     be transferred to GPO and/or NARA without added burden to
     the agencies? GPO and NARA will have to work together to
     identify ways in which agencies can transfer the information
     without an added burden.

Extended FDLP Access to Electronic Information Dissemination
Products and Services

     What is the most cost-effective and useful method for
     preserving FDLP access to electronic Government information
     available from agency Web sites or online services? The
     maintenance and migration of electronic information over a
     period of years can be very costly. If information already
     has been distributed in paper, microfiche or CD-ROM does it
     make sense to provide continued online access to the
     information? If an agency decides to discontinue access to
     information through their Web site, does GPO have a
     responsibility to obtain the information and provide funds
     and resources for its continued access through the FDLP?

Differences Between the Life-Cycle of Information Dissemination
Products and Services in Electronic vs. Traditional Formats

     How is the lifecycle for electronic information different
     from that of traditional formats like paper and microfiche?
     What part of the information dissemination process must be
     changed in order to ensure extended access and the
     archivability of information on agency Web sites?

*****************************************************************

Judy Russell <jrussell@gpo.gov>

Comments should be submitted by Friday, February 16, 1996, by
internet e-mail to study@gpo.gov, by fax to FDLP Study at (202)
512-1262, or by mail to FDLP Study, Mail Stop SDE, U.S.
Government Printing Office, Washington, DC 20401.



This archive was generated by hypermail 2b29 : Wed Nov 14 2007 - 20:49:10 PST