|
December 2004 Report | ![]() |
| Home | Run Reports | Import Statistics | Manage Profiles & Lists | Reference |
| Project Description | Project Status | Survey | Survey Results | Survey Results w/ Data |
December 2004 ReportThis page contains the full report on the ERUS Project at December 2004 (minus appendices, which are linked from within this page and may be printed separately). For further information on the ERUS Project, please see the main ERUS page, or contact Caryn Anderson at caryn.anderson@simmons.edu. Table of Contents
IntroductionIn every field of endeavor, there is always room for the perspectives of students too new to the field to believe there are things that can't be done. In spite of shaking heads and knowing glances, the elders realize that it is often the naive energy and motivation of these neophytes that expands the perimeters of the discipline, even if the students' goals are not entirely fulfilled. The nascent realm of electronic resource usage statistics is so new that even the elders aren't sure what is and isn't possible. It is within this environment, that the ERUS project was initiated to develop an integrated database for collecting, managing and analyzing electronic resource usage statistics. Between April and December 2004, the degree to which the ambition of the project exceeded what was practical or possible seemed to grow exponentially with the number of steps completed. As of December 2004, historical and current literature has been reviewed, surveys of the field have been conducted, user needs have been analyzed, structures have been designed, data collection options have been explored, and a prototype has been partially developed. At this point, successful completion of a functional, web-accessible database that would serve multiple institutions appears much further away than initially hoped (under a repository structure, or even as a set of open source files and guidelines adaptable by others). Even a very simple version of the database has become increasingly difficult to create due to the diversity of vendor and product types, the diversity of institutional needs, the rapidly transforming state of the field, and the persistently marginal impact of usage statistics in decision making about electronic resources. The matter of impact represents a classic paradox. The development of better systems for collecting, managing and analyzing usage statistics would increase the potential impact of these stastistics. But the imprecise nature of the data, the difficulty in explaining statistical analysis to other staff and decision makers, and the exclusively supportive role of statistics in any decision making process have restricted the available resources (staff time and money) made available for improving the systems that would, in turn, overcome these disadvantages. Nevertheless, each step forward enables a slightly clearer view of the landscape, and if the challenges of this project can contribute even a weak light to the shadowy path of usage statistics, then all will not have been in vain. If nothing else, there is value in sharing the process in order to guide others embarking on similar ventures. The following pages will review the history of the ERUS project to date, discuss the accomplishments of each of the key stages completed (review of the field, user needs, database construction and data collection), and outline its current status including the next steps for moving the development process forward. HistoryChronologyThe ERUS Project began in the fall of 2003 with an innocent question from a new library and information science graduate student. "How do you know who is using what?" The eye-rolling was the first clue that online resources were a slippery frontier full of moving, morphing targets, frequently out of focus. Further conversations with Megan Fox, Web and Electronic Resources Librarian for Simmons College, revealed that electronic resource vendors provided statistics about the use of their products in a variety of different formats (online, e-mail, text, excel, csv, etc.) and tracked a variety of different elements (hits, searches, sessions, types, etc.). Few of the statistics, if any, were comparable because they used different definitions for each measure. And, in order to see all the statistics in one place, Megan was maintaining an enormous Excel spreadsheet. This file required an obscene amount of labor to massage the statistics retrieved from various vendors in order to make them fit into one mammoth spreadsheet. Separate from vendor-provided statistics, Megan was also reviewing the statistics that could be gleaned from the technology systems on campus (tracking on-campus IP activity as well as proxy server actions). Even when narrowing analysis to just vendor statistics, it was virtually impossible to tell how users were accessing resources (from a search inside another database? from the OPAC? from a bookmark?). But setting aside these access issues, it became clear that a system that could even just hold all the statistics in a single place, for simple access, would provide a significant advantage over Megan's super-sized spreadsheet. After completion of a database class, the ERUS project was formed in earnest as an independent study project to create a system to meet Megan's needs while simultaneously serving as a vehicle for learning how to build web-accessible databases with PHP and MySQL. Assistant Professor Dr. Gary Geisler served as faculty advisor to provide guidance on the technical aspects of building a database using PHP scripting language in combination with MySQL database query language (served via an Apache web server). Megan Fox served as staff supervisor, providing guidance on content issues and user specifications. Activities in the Spring of 2004 included a review of the state of the field of electronic resources usage statistics. Relevant literature was explored along with current initiatives working to standardize or systematize pieces of the usage statistics puzzle. In addition, a full analysis was undertaken of the ways different vendors provided statistics to Simmons College . The results of the above investigation generated a preliminary database design. An informal survey was undertaken in the Summer of 2004 to get a better sense of the "real world" as it compared to the literature and the Simmons-focused review. Although the survey did reveal some patterns in the needs and desire of staff responsible for collecting and analyzing usage statistics, the diversity of responses noticably exceeded the similarity. (See the Survey Results) Perhaps the most advantageous outcome of the survey was the opportunity to connect with two talented individuals working on the problem of managing e-resource usage statistics at other institutions. (See Partnerships below) In the Fall of 2004, the development of the database began in earnest. The self-education in PHP and MySQL (partially learned in a Database Management class in the Spring) was extremely challenging but eventually resulted in a simple, functional database accessible from a web interface. During this time, meetings with partners expanded the collaborative effort, and attendance at a key workshop dedicated to electronic usage statistics provided the opportunity to hear the very latest in the field as well as meet and ask questions of the pioneers in this province of the Electronic Resource Management landscape. (View NFAIS Conference Report Notes.) By December of 2004, a fairly stable database structure was in place, providing the foundation upon which partners at Trinity College and Villanova University could work to resolve challenges with data collection. While continued problems with PHP and data loading errors prohibited the completion of a working prototype by the end of the semester, many lessons were learned and a strong foundation established for the further development of an integrated system. PartnershipsLori Stethers, Systems Librarian for Trinity College in Hartford, Connecticut, posted an inquiry to the ERIL-L listserv (Electronic Resources in Libraries) just a few weeks in advance of the posting of the ERUS survey to the same list. Lori was looking for guidance from others who were developing systems for coordinating the collection and analysis of usage statistics. After following up with Lori, who was far more familiar with programming languages, it became clear that it could benefit both parties if efforts were coordinated. Lori agreed to provide assistance with data collection through her scripting expertise in support of the development of the web-accessible database structure being designed at Simmons. With an agreement to share and combine data in the process of developing a system that would be useful for both institutions, the first partnership was born. As of December 2004, Lori has developed preliminary scripts for extracting data from scheduled statistics reports sent by vendors via e-mail and for importing them into the ERUS database. She has been communicating and comparing notes with another partner, Andrew Nagy, at Villanova University. Upon the posting of the survey to the ERIL-L listserv, Andrew Nagy, Library Technology Development Specialist at Villanova University, made contact indicating that he was developing something very similar to the ERUS project. He was curious to learn more about the ERUS project while simultaneously agreeing to share his work. Immediately following the NFAIS usage statistics conference in early October, a meeting with Andrew and other Villanova staff in Pennsylvania uncovered similarities and differences between Andrew's Libraray Statistics Gathering and Reporting (LibSGR) and the ERUS project. While both systems were focused on collecting all vendor stastistics into a single, web-accessible database, the LibSGR project was focused exclusively on electronic journals, while the ERUS project was focused initially on indexing and abstracting databases. It was agreed that if the ERUS project was able to refine its structure to more closely match that of the LibSGR, then the two systems could be developed in tandem, and ultimately be brought together in a single super-system for handling the bulk of electronic resources in most academic libraries (i.e. initially excluding e-books and e-reference materials). In addition, Andrew's extensive programming background enabled him to also work on various scripting techniques for the persistent challenge of collecting data from diverse sources. As of December 2004, the ERUS Project database structure has been mapped to the LibSGR structure, and Andrew has been in communication with Lori about data collection strategies. With this three-way partnership among three different types of academic library institutions, it is hoped that a reasonably comprehensive and flexible system can be developed that will meet a large number of the needs of different institutions. State of the FieldActivitiesIn the spring of 2004, a review of the literature and initiatives focused on electronic resources usage statistics uncovered a nascent field of determined academics and practioners desperately struggling to bring order to the new and chaotic world of online databases, journals and other electronic resources. By this time, the COUNTER Project (Counting Online Usage of NeTworked Electronic Resources) had released its first version of a standard for vendors to follow in producing and providing usage statistics for electronic journals and databases. Only three major vendors serving Simmons (of the 40+ total suppliers) were in compliance with COUNTER, but the sense in the field seemed to be that although COUNTER would not solve every problem, standardization would be a very important step forward in collecting, managing and analyzing usage statistics. In addition to the COUNTER Project, multiple other initiatives and forums had emerged, representing responses to different aspects of the issue. For example, as Dr. David Goodman explained at the NFAIS workshop on online usage statistics in New York in October 2004, NISO [National Information Standards Organization] is motivated by issues concerned with general library management, while ICOLC [International Coalition of Library Consortia] is particularly interested in desired practices for libraries in negotiating contracts with vendors. The COUNTER project is specifically focused on the value of usage data for both publishers and libraries. Conveniently, the NFAIS forum on "Online Usage Statistics" was scheduled right in the middle of the semester, and provided amazing insight into the scope and condition of the field. (View the NFAIS Conference Report Notes and the list of Iniatives and Forums) Literature ReviewMuch of the literature available on usage statistics involved the study primarily of electronic serials. Judy Luther's seminal white paper, for example, stands as a key document outlining the unique challenges associated with tracking usage of online journals, but does not directly address some of the very specific issues associated with usage of databases (e.g. measuring the usage associated with links to full-text versions of articles from citations both in and outside of the database). The other major theme in usage statistics literature addressed the distinctions between vendor-provided statistics and those collected locally by the technology departments of institutions. While both of these perspectives were important, they did not fully respond to the issues that the ERUS Project was concerned with, namely how do you manage all your resources under the imperfect and chaotic conditions that exist right now. In one way it was encouraging that the ERUS Project was actually filling a gap in the knowledge and practice base. On the other hand, there appeared to be no footsteps in which to follow - thus making the whole endeavor that much more difficult. The remaining bulk of the literature described the COUNTER project in various contexts. (View the Annotated Bibliography) ToolsMany of the initiatives and forums have generated specific tools for learning about, or facilititating, various components of the usage statistics processes of collection, analysis and mangement. These tools range from video conferences and power-point presentations to modular, online self-education programs and standardized Entity-Relationship diagrams with Data Dictionaries designed for the development Electronic Resources Management systems. (View the ERM Tools) User Needs AssessmentMegan Fox / Simmons CollegeThe review of relevant literature and initiatives was conducted in conjunction with an analysis of the resources subscribed to by Simmons College and the way usage statistics were provided for each resource. As a result of this analysis, some preliminary parameters were established for developing a database to help manage the Simmons resources. In order to make the project manageable, it was decided to focus only on vendor collected statistics, and prioritize the work based on those vendors who had registered the most usage in the previous two years. Discussions with Megan about her interest in an integrated usage statistics database not only highlighted a desire for more sophisticated analysis functions, like cost per search and comparison with other institutions of similar size, but also inspired the idea of creating a repository where statistics could be held for multiple institutions. This then led to the decision to conduct an informal survey to get a sense of the commons issues and desires of other information and library staff charged with tracking and analyzing usage statistics. SurveyA 20 question survey was designed:
In the Summer of 2004 this survey was presented to the ERIL-L listserv (Electronic Resources in Libraries) and generated 29 responses. While this survey was not formal, it became fairly clear from the responses that any usage statistics system ought to primarily focus on automating the data collection process and storing data in the smallest reasonable data units so that customized reporting is more feasible. Standardization, and compliance with the standards, were commonly identified as significant obstacles. These are problems that can only be solved by advocacy with vendors and support of standardization efforts, but it still seemed possible that a tool could be developed to assist electronic resource librarians in managing usage statistics in the current, chaotic, un-standardized environment. (View the Survey questions. View the Survey Results Summaries) Database ConstructionDatabase DesignThe process of designing the database was hung up for the longest time around questions about what the actual "entities" are that must be considered as the key players in the complex electronic resouce environment. There are vendors, publishers, services, databases, journals, e-books, reference books/databases, statistical databases and more. All these different types of suppliers and resources have very complicated relationships with each other, and at least half a dozen different iterations of the preliminary entity-relationship diagram were generated before settling on a design based on the simplest basic units - resorting to representing the complex relationships through recursive relations and linking tables. The Entity-Relationship (ER) diagram and schema/data dictionary were actively revised as the database was built - substantially so after the meeting with Villanova in order to reconcile the two structures. The Database Structure spreadsheet is an adapted schema based on the relation-mapping process between Villanova and Simmons. The actual ER diagram and Relational Schema/Data Dictionary have yet to be updated to reflect these changes as there are still a few more items that are being worked out as the database is being finished. As of December 2004, the Database Structure document provides the most current structure, but the following image links to a PPT version of the October 2004 ER diagram that is useful for visualizing the database structure: View the Database Design documents under development (updated versions will be posted as soon as possible):
TechnologyOpen source tools were selected for creating and serving the database for three reasons. 1. There were no direct costs (programs/files were free), and limited indirect costs (training materials and labor). 2. One of the goals of the project was to design a system that could potentially be shared with others. Designing in open source software would make the database easily portable. 3. Technical assistance from the faculty advisor was critical, and Dr. Gary Geisler was highly proficient in the PHP, MySQL, Apache trio from his work in developing the Open Video Project online digital video library. The learning curve for the programming and database languages and web serving platform was steeper than anticipated and delayed the development of the prototype significantly. Upon reflection, it appears that it would have been wiser to have selected a less ambitious and complex project as a vehicle for learning to build web-accessible databases, and that the project itself would have been more effectively serviced by pre-existing proficiences with PHP and MySQL. Alternatively, the advantage to building the database with "beginner-level" PHP and MySQL code and structure is that the technical proficiency bar will be conveniently low for other institutions of varying sizes wishing to adopt and adapt the system for application to their own organizations. Web InterfaceAs of December 2004, Simmons College does not have a database server available to students. As such, the functional database prototype is available only on the personal computer of Caryn Anderson. The version of the web site that is available over the internet consists of .html pages that contain representations of what retrieved results would look like in the .php accessed database. Once the full database can be served publicly, access will be restricted to partners by login, though the .html version of the site will remain available to provide examples for non-partners. The web interface for the "guts" of the ERUS Project consists of the areas of the web site identified as Run Reports and Import Statistics.
The web site also contains sections designed to support the user in maintaining and expanding the depth of their analysis (Manage Profiles and Lists) within the ERUS system as well as maintaining and expanding their own knowledge and systems (Reference).
The Project Description section can be reached from the Home page, which includes not only a basic description of the motivation and purpose of the project, but this is where the current Project Status can be found as well as links to the Survey and the Survey Results. The following image links to a PPT version of the site map of the web site at December 2004. Data CollectionBackgroundThe problem of data collection has been perhaps the most onerous of the entire project. The creation of a database to hold all the diverse statistics is, conceptually, not all that different than the giant Excel spreadsheets maintained by countless librarians and information managers the world over. It is the collection of the data from dozens and dozens of different sources that tries the soul. Even with the vendors that are COUNTER compliant, there are different URLs for accessing statistics, different passwords for different levels of access, different protocols for creating the standard reports, different formats available for those reports, and highly variable degrees of data manipulation required to prepare the data for even the simplest analysis. The ERUS Project viewed this piece of the project as critical, but could hardly begin to consider it seriously until the basic structure into which the data would be imported was established. It was through great good fortune that the relationships with Trinity College and Villanova University were struck, which enabled these most technologically challenging pieces to be attacked by far more experienced professionals. At first, Andrew and Lori had both independently considered the idea of designing scripts to call the URLs for the vendor statistics databases, enter the systems, and extract the data directly from the vendors in question, but both were rapidly repelled by the security functions of the various web sites. At this point Lori got the idea of using the popular option of scheduling standard reports to be delivered from the vendor to the instution by e-mail. At this point, it was anticipated that scripts could more easily be designed to extract the appropriate data from the e-mails themselves. Whatever security features were necessary to include in the scripts were only being executed within the institutions network, rather than over the web, which made the process much more appealing. Andrew simultaneously had begun designing scripts to get into certain vendor sites with lower demands for "over the web" security risks. At the point that Lori and Andrew began communicating about their various strategies, Lori had begun to realize that although her technique was looking like it would pay off, it was taking much longer to create the script for one vendor than she thought. It was also clear that each vendor (even ones that were providing COUNTER-compliant statistics) was going to require a great deal of customization. In facing this challenge, it is encouraging to know that Andrew's options appeared to be paying off with other vendors so that some sort of combination of the two strategies might help to cover the bulk of the vendors. Moving ForwardAs with the database itself, once a few workable prototype scripts become functional, the process of modeling and customizing these scripts for other vendors should proceed more rapidly. Both Lori and Andrew have full-time jobs that restrict their activities in this area. Although both institutions have agreed to allow them to commit a portion of their time to this work, it is difficult to schedule it against the seemingly constant flow of urgent emergencies in the library environment. Nevertheless, they intend to press on and, once a demonstrable script is completed, the project can prioritize the vendors to be "scripted" and a more comprehensive strategy can be devised. Current Project Status and Next StepsStatusOverall, the current status of the ERUS project could best be described as "actively begun." Without a demonstrable prototype it is difficult to be convinced of recognizable progress. Nevertheless, a significant amount of background work, design work, and foundation building has occurred that will be instrumental for allowing the project to move rapidly forward once the prototype is functional. The miscalculation of timeline on the prototype can only be attributed to student naivete combined with extreme zeal. There is some apprehension about the loss of momentum now that the academic term has concluded. This is where the partnerships provide a unique additional advantage. As the project involves other stakeholders beyond a single student, or even a single institution, the potential for continued momentum is greater. In addition, this momentum can be easily guided by the extensive, yet basically reasonable, list of Next Steps. Next StepsThe most important next steps in the ERUS Project are to finish the programming elements necessary for a functional prototype and then determine how the project should be structured in order to move it forward. The current list of Next Steps loosely combines a variety of short-term and long-term goals and serves as an effective overview of the potential of the project. The steps to get a functioning prototype working must currently continue on a primarily volunteer basis. Once a working resource is available for demonstration and testing, however, it seems reasonable that a plan can then be crafted to secure funding to carry the project through to a place where it can provide the critical assistance to librarians and information managers that it was designed for. ConclusionThe ERUS Project has proved itself more difficult than expected at almost every turn in the road. Differences between vendors and their various roles as vendor and/or publisher and/or editor/creator generated continuous complications for database design, as did the variety of resources (particularly those resources available from more than one vendor) and product types (especially the diverse elements of databases versus electronic journals). Differences between institutions and their e-resource inventory, their types of contracts with vendors, and their technological sophistication (i.e. varying needs for integration with ILS and other campus technologies) shifted the sands beneath the alternating visions of a full-service, multi-member repository or open source customizable product. On a more personal level, the learning curve on the programming technology turned out to be steeper than expected. Although PHP and MySQL are not difficult to learn, per se, the level of complexity the ERUS Project demanded from a beginner caused frustrating delays in the development of the prototype. Equally frustrating was the rapidly changing state of the field. Almost weekly, new iniatiatives, projects, forums or publications were being discovered that re-ordered the perspective of the field just enough to call the value of the ERUS Project into question temporarily. Sometimes this resulted in slight adjustments to the project. Other times it would just contribute contextual knowledge and added resources. Nevertheless, the constant external stimulation, and the review it demanded, was frequently distracting. Finally, the biggest challenge has been the frustrating lack of impact of electronic resource usage statistics. This lack of impact inhibits researchers and practitioners from receiving the investment necessary to develop the field into one that can make more of a contribution to e-resource management and decision making. There is a clear absence of real best practices in electronic resource usage statistics in the face of extreme complexities in the relationships between players, products and the vehicles that deliver them. Shrinking library budgets and the corresponding demands for quantifiable justification of expenditures also cast great shadows over the growth of new solutions. Usage statistics can only ever be just a part of any decision making process - they can't be seen in isolation. In addition, many decision-makers do not easily understand statistical analysis, particuarly if it is provided by a staff member not trained in how to present it. This is complicated by the fact that with electronic resource usage statistics there are too many variables involved to enable the provision of definitive and compelling information. As such, the real impact of usage statistics remains relatively low and many decision-makers are loathe to allow more resources to be devoted to the processes which would, in turn, make them more precise and generate greater impact. This paradox is one that faces most new fields. It is what has made this project more difficult that expected, but also what makes it more exciting. What sometimes seems feels like swimming upstream also, at times, feels like blazing new trails. It may be years before standards and protocols can get worked out to the degree that commercial products can provide reliable, comprehensive and affordable services for collecting, managing and analyzing e-resource usage statistics. In the meantime, there are still many librarians and information managers that appear to be clamoring for something that is even just "good enough" to relieve them of their labor-intensive statistics burdens. In this environment, there is still real hope that the ERUS Project can meet those needs, regardless of its challenges and simplicity (or maybe because of them). |
||||
|
Content Updated: 2 February 2005
caryn.anderson@simmons.edu |
Contact | Help | Simmons College Disclaimer | ||
This work is licensed under a Creative Commons License
by Caryn L. Anderson.
|
|
|||