HomeEvergreen: Basic System Admin FunctionsPrinter Friendly Version

Evergreen: Basic System Admin Functions

A collection of commands and tips, tricks, and spells for use with most Evergreen systems.

1. srfsh# functions, commands, etc

1.1. Checking open-ils C svcs

>>>checking open-ils C services<<<
   srfsh#  request open-ils.cstore open-ils.cstore.direct.actor.user.retrieve 1

   Note: Trimmed for brevity; look for data returned and not an error
             Received Data: {
         "__c":"au",
         "__p":[
           null,
           null,
           null,
           null,
           null,
           null,
           ----------snip-----------

1.2. Checking open-ils Perl Svcs

>>>checking open-ils PERL services<<<
   srfsh#  request open-ils.storage open-ils.storage.direct.actor.user.retrieve 1
     Note: Trimmed for brevity; look for data returned and not an error.
             Received Data: {
         "__c":"au",
         "__p":[
           null,
           null,
           null,
           null,
           null,
           null,
           null,
           null,
           ----------snip-----------

1.3. Checking openils.auth

>>>checking openils.auth<<<
   srfsh# login <username> <password>
     Note: The request needs to complete successfully...the textcode does not have to be "SUCCESS"
         Received Data: "e1a8125c1b1bb364c0fe0a8e5a8f62b0"
       ------------------------------------
       Request Completed Successfully
       Request Time in seconds: 0.004942
       ------------------------------------
       Received Data: {
         "ilsevent":0,
         "textcode":"SUCCESS",
         "desc":" ",
         "pid":17863,
         "stacktrace":"oils_auth.c:312",
         "payload":{
           "authtoken":"a979d2ade9d967890d44c5eb387dcade",
           "authtime":420.000000
         }
       }
       ------------------------------------
       Request Completed Successfully
       Request Time in seconds: 0.191406
       ------------------------------------
       Login Session: a979d2ade9d967890d44c5eb387dcade.  Session timeout: 420.000000

1.4. checking ILS version

>>>checking ILS versions<<<
   srfsh# request open-ils.actor opensrf.open-ils.system.ils_version
   or
   http://<library.url>/gateway?service=open-ils.actor&method=opensrf.open-ils.system.ils_version&param=

2. Reporter Issues

2.1. Examining Reports

>>>examining currently running/overdue/scheduled reports<<<
   Say you want to see what reports claim to be running right now.  In the
   database:

       select * from reporter.currently_running;

   The name column should match report names listed in `ps ax|grep -i clark`.

   If you want to know what reports need to run ASAP (their run_time is here):

       select * from reporter.overdue_reports;

   And, finally if you want to see what reports will run in the future:

       select * from reporter.pending_reports;

   Now, if you want to see what queries are currently executing, and for how long
   they've been running, ordered by the duration:

       select now() - query_start as duration,procpid,current_query from
         pg_stat_activity where current_query <> '<IDLE>' order by 1;

2.2. Reporter Stop/Start

>>>stopping reporter<<<
   ps aux | grep -i clark
   postgres 14346  0.0  0.2 158980 34312 ?        Ss   15:52   0:00 Clark Kent, waiting for trouble
   kill 14346
   cd /tmp/
   ls
   reporter-LOCK
   rm reporter-LOCK

>>>starting reporter<<<
   PERL5LIB=/openils/lib/perl5 /openils/bin/clark-kent.pl -c 1 -d
   ps aux | grep -i clark
   postgres 14370  0.0  0.2 158980 34312 ?        Ss   15:52   0:00 Clark Kent, waiting for trouble

2.3. Kill Reports or Database Processes politely

   >>>killing database processes politely<<<
   This is particularly useful for killing runaway reports.  Note that this is an email from Miker.
             ----------------snip----------------
       First, find the bad report:
       evergreen=# select now()-query_start,procpid,current_query from
         pg_stat_activity where current_query <> '<IDLE>';
                     ?column?     | procpid |                                                                                                              
       current_query                                                                                                               
       -----------------+---------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        00:00:00        |   24373 | select now()-query_start,procpid,current_query
       from pg_stat_activity where current_query <> '<IDLE>';
        06:00:12.664521 |   19681 |
       SELECT\x09"6b89db56fc7d830b621fe56999f87eba"."shortname" AS "Library Short
       (Policy) Name",
                                  :
       \x09"7fb66fd46f5581428d27fcd56ad5c85f"."circ_modifier" AS "Circulation
       Modifier",
                                  : \x09EXTRACT(YEAR FROM
       "6531077d9eb1b14d8147b82f2ffd1d6f"."checkin_time") || '-' || LPAD(EXTRACT(MONTH
       FROM "6531077d9eb1b14d8147b82f2ffd1d6f"."checkin_time"),2,'0') AS "Circulation
       -> Check In Library -> Checkins -> Check In Date/Time",
                                  :
       \x09COUNT("fce458da5814089639960610c48d19d2"."id") AS "Count"
                                  :   FROM\x09action.circulation AS
       "fce458da5814089639960610c48d19d2"
                                  : \x09INNER JOIN actor.org_unit AS
       "5ba72c6d4ac452a3221b0782d5e85614" ON
       ("fce458da5814089639960610c48d19d2"."checkin_lib" =
       "5ba72c6d4ac452a3221b0782d5e85614"."id")
                                  : \x09LEFT OUTER JOIN action.circulation AS
       "6531077d9eb1b14d8147b82f2ffd1d6f" ON ("5ba72c6d4ac452a3221b0782d5e85614"."id"
       = "6531077d9eb1b14d8147b82f2ffd1d6f"."checkin_lib")
                                  : \x09INNER JOIN asset.copy AS
       "7fb66fd46f5581428d27fcd56ad5c85f" ON
       ("fce458da5814089639960610c48d19d2"."target_copy" =
       "7fb66fd46f5581428d27fcd56ad5c85f"."id")
                                  : \x09INNER JOIN
       (2 rows)

       The second query is the baddy ... so:

       evergreen=# select pg_cancel_backend(19681);

       pg_cancel_backend
       -------------------
        t
       (1 row)

       ?
       ----------------snip----------------

3. AutoGen

3.1. When to run AutoGen

*If you change any of the following, you must run autogen:*



1.      The org unit name

2.      The org unit shortname (policy code)

3.      The type of the org unit (System, Branch, BookMobile, etc)

4.      The position of the org unit within the tree (moved to another parent; moved up to the System level under PINES; etc)

5.      The visibility of the org unit

6.      Adding an org unit

7.      Adding an org unit type

8.      The removal of an org units or locales

9.      If you add a locale  - there is no interface for this yet ^1.6

10.  If you add a new org "lasso"- there is no interface for this yet ^1.6

11.  If you change the fm_IDL.xml file - developers only

4. Housekeeping

4.1. HOWTO: Extract offline logs

In the event that one decides to extract the offline logs from a session for viewing, here is a process:

Our situation is that an administrator has uploaded offline transactions within a week and needs to see what was exactly uploaded. In this scenario, our transaction in question has occurred within the past 6 days.

  • First, we need to know what has been uploaded. This is discovered by a simple search. Go to /openils/var/data/offline/archive/ There, you should see a bunch of directories that are numbered.
  • Create a directory to place them into. I suggest /tmp/[your new dir] I called mine "offy" for "offlines"
  • Execute 'find . -mtime -6 -name *.log -exec cp {} /tmp/offy \;'
  • Collect your log files in the offy directory and do what you will with them.

4.2. Quickly Update Crontab for Restart

In order to quickly comment out the crontab for restarts, open crontab with vim as root and type the following:

crontab -u opensrf -e

This opens the crontab for editing. Once within vim, type the following:

:%s/^/###/g

This tells vim that you want to replace every escape sequence with a triple pound sign, thus commenting out the crontab so random scheduled scripts don't start while you are re-starting and muck everything up.

To save the fine work you've just performed, type :wq [enter]

Change to the opensrf user and perform your restart. Exit back to root.

Go back into crontab:

crontab -u opensrf -e

This opens the crontab again for editing. Once within vim, type the following:

:%s/###//g

Notice that you are removing the pounds and making everything normal again.

Save it again with :wq [enter]

Keep in mind that every time a crontab is edited, the crontab is instantly reloaded.

5. Circulation Policies

5.1. Renews Should Be Blocked For Items With Holds

In the /openils/var/circ/circ_permit_renew.js script, add,

 

log_info("permit_renew searching for potential holds for copy " + copy.barcode)
var hold = copy.fetchBestHold();
if( hold ) {
log_info("hold found for renewal item, checking hold->usr..");
log_info("hold = " + hold + "  hold.usr = " + hold.usr + "   patron.id = " + patron.id);
if( hold.usr != patron.id ) {
result.events.push('COPY_NEEDED_FOR_HOLD');
}
}

System also may need to have Circulate.pm patched.

See Evergreen trunk revision r13249 http://svn.open-ils.org/trac/ILS/changeset/13249 for current patch. There was an issue with the way events were called, so without this change the system will ignore the events.push above.

 

--Steve Callender 1/12/10

6. Provisioning of machines

6.1. Cloning machines: Purge ejabberd (mnesia)

When cloning EG machines (specifically, the machines running the ejabberd process), it is important that ejabberd be purged from the system (on Debian, this is `apt-get purge ejabberd`), reinstalled, re-configured, and users' re-set up.  The ejabberd.cfg file can (usually) be retained for re-used after ejabberd reinstall.

A cloned ejabberd 'mnesia' database on the LAN can cause network communication issues on other machines - Even if only responding on localhost or 127/8.  For example, a single-server brick cloned to add capacity to a cluster can result in Jabber communication failures on both bricks - presenting as request timeouts, lost listeners, and the like.

6.2. Adding brick head to talk to db

****example used is Indiana***

get to postgres box
 
sudo su - postgres
cd eg-db
edit pg_hba.conf
bottom of the file
 
find the section that has:
host all evergreen 208.119.0.201/32 md5
host all evergreen 208.119.0.202/32 md5
host all evergreen 208.119.0.203/32 md5
 
 
add
host all evergreen 208.119.0.200/32 md5
 
issue a pg_ctl reload:
/var/pg8213/bin/pg_ctl -D /var/postgresql/eg-db reload

7. Applying patches

7.1. Checking for Patches

Oftimes when working with an issue regarding the Evergreen system, a patch will need to be located and verified. You may, or may not have the proper location of the said file, and if not, must locate and retrieve it.

 

If the patch was sent to the Open-ILS Dev mailing list:

When a developer states that a patch, or work was submitted to the mailing list, they are referring to the Open-ILS Dev Mailing List. If you do not subscribe to the mailing list, or have an exact submission, estimate the date range within a month and year that the patch would have been submitted. You may browse a list by month and year here.

Then, you may browse these archives for the month by Thread, Subject, Author, or Date.

Once the proper thread regarding your issue is found, read the full thread. After the request, and any appropriate comments have been tendered, a "commit" has usually been applied, with the proper committing developer's name and changeset applied.

Verify Your Version:

Changesets are handled at http://svn.open-ils.org. Once you view the changeset, there may be multiple entries for the same set of inducted code. Do not get confused. You should have (generally) one commit for "trunk", which is the development branch, one for the major release (e.g. 1.6), and finally one for the minor release (e.g. 1.6.0). Instead of the periods in the example, do not be surprised to find underscores instead, like 1_6 or 1_6_0.

The proper selection is the version closest to the server being patched.

 

Review The Code First!

Always review the original file in a text editor first to ensure that the patch being applied is in fact different from the file already possessed in the system. If this file contains the portions being removed, or is exempted from the code being added, then proceed to the next article on SVN patch retrieval and installation.

7.2. from SVN

- obtain the SVN diff (either from a SVN command or the Trac 'Unified Diff')

- `cd` to a directory containing the file(s) to be patched

- apply the patch: `patch -Np# -i /home/esi/file.diff` (# = number of '/' to skip in the 'Index' line)

 

Ex: If I download a patch that has an index of:
"Index: branches/rel_1_6_0/Open-ILS/xul/staff_client/chrome/content/OpenILS/global_util.js"
I might cd into /openils/var/web/xul/server/, then run `patch -Np7 -i patch.diff` (skip '7' slashes, meaning apply the change to ./OpenILS/global_util.js).

When applying a patch to a customer system, we need to easily track
1) that the patch was applied; and
2) when it was applied.
We do this by inserting a value inside the database table
"config.upgrade_log".

For example, if I applied the changeset at
http://svn.open-ils.org/trac/ILS/changeset/13972/branches/rel_1_4_0, I
would make an entry into config.upgrade_log to indicate such...
insert into config.upgrade_log values ('SVN-13972',now());
This indicates that version "SVN-13972" has been applied, timestamped now.

8. Errors

8.1. MERGED_USER_IN_COLLECTIONS

When an attempt to merge two patron accounts ends in failure, one error presented may be "MERGED_USER_IN_COLLECTIONS".

This means that the patron in question has an existing row in money.collections_tracker. If the row merely exists, this is enough for the merging attempt to derail and not allow the merge to occur.

Remove the row by either updating the usr row, or deleting altogether if unnecessary. Then re-attempt the merge.

8.2. textcode:OFFLINE_FILE_ERROR

When this error occurs, the client is most likely attempting to upload an offline transaction.

It will appear to the user as such:

Check User Permissions

First, check to ensure that the particular staff user account has permissions set to upload offline transactions.

 

Grab Information About The Incident

The error, if you have a screenshot, should contain several important pieces of information.

The "description" as shown in the staff client will be replicated at offline.session_description.

The handle on the window should be in the format of [tab number]:[username]@[workstation].[domain]. That handle will appear as such:

Workstation as denoted within the tab handle is replicated in offline.script_workstation. In our example, we would enter the psql prompt and execute:

select * from offline.script where workstation like '%oyce%';

This is a wildcard search performed with the percent sign within the table that will present any records of workstations with names containing "oyce". Be careful, this is case-sensitive! This is why we scrape the initial capital letter off, to provide a unique string to the select that will limit or trim the results to a more palatable index.

 

Locate Offline Directory Paths

Two sets of directory structures exist;

/openils/var/data/offline//pending/[org] - for transactions pending

/openils/var/data/offline//archive/[org] - for transactions completed

In the database, this path was shown in the previous example. The number past the archive/pending path is generally the org unit ID as Evergreen sees it.

Verify Offline Directory Path Ownership

Change directory to the offline directory path, and as the opensrf user execute:

find /openils/var/data/offline/ -not -user opensrf -printf '%u %p\n'

Any files not owed by opensrf should appear, if all is correct, the prompt should return blank.

 

Verify Apache Process Ownership

Apache is generally configured to be executed as root, but subsequent processes are owned by the opensrf user. This is validated in 1.6 and newer systems by examining /etc/apache2/envvars as root.

There is a specific item to validate, and this is to ensure that APACHE_RUN_USER=opensrf, rather than the system default of APACHE_RUN_USER=www-data.

If it is in fact attributed to www-data, this value must be changed to opensrf, and the Apache services restarted.

 

Verify Proper Mounts of Paths (For Multi Brick or Distributed Systems)

If the system is a multi-brick or distributed system, log into each brick and execute 'mount -a' to ensure all mount points are properly connected. This command is usually executed by default, but if for any reason it fails to connect properly to the /openils directory, many issues may result (such as this one). The command looks at the list of mountpoints, and automatically connects to any not previously connected.

 

Ensure Existence of All Intended Files

When an offline transaction is uploaded for processing, a [workstation].log file is created in the /openils/var/data/offline//pending/[org]/[session]/ directory. You must validate that all files marked for processing from the staff client actually exist, or the same "an offline file could not be created or accessed" error will occur. This would be logical, as one or more of the files listed or marked for processing does not, in fact, exist on the server.

At this point, you have most likely found the issue and corrected it. Offline transactions should process normally.

8.3. DBD::Pg::st execute failed: ERROR: relation "reporter.classic_current_circ" does not exist at /openils/bin/clark-kent.pl

Encountered in ticket 12577, Evergreen 1.4.0.7

 

If this message is received, it is most likely that the system in question is using reports from PINES.

 

In the PINES system, certain database views exist that are not found in the stock Evergreen system. To create this view, download the attached file to the database server and enter into a psql session. The name of the file as attached is rccc.sql. At the psql prompt, type `\i rccc.sql`.

The following message should be received if done correctly:

evergreen=# \i rccc.sql
CREATE VIEW
evergreen=#

Run the report in question again to verify active state.

8.4. NFS Stale File Handle error and solution

Sometimes, NFS can result in to weird problems. For example, NFS mounted directories sometimes contain stale file handles. If you run a command from a shell prompt you will see an error similar to this one for offline transactions:
$ cat foo.log
foo.log: Stale NFS file handle

Managing NFS and NIS, 2nd Edition book defines filehandles as follows:


A filehandle becomes stale whenever the file or directory referenced by the handle is removed by another host, while your client still holds an active reference to the object. A typical example occurs when the current directory of a process, running on your client, is removed on the server (either by a process running on the server or on another client).

So this can occur if the directory is modified on the NFS server, but the directories modification time is not updated.

How do I fix this problem?

Find a good base directory mount point and execute the following:

$ mount -o remount [directory you selected]

This basically refreshes the NFS mount across all mounted points.

8.5. Can't call method "opac_visible" on an undefined value at org_tree_html_options.pl

This updates the slimpac library list.  By itself, this wouldn't have any effect if the brick isn't used to supply an OPAC for the system.  However, the cause of the failure is that the open-ils.actor service (and several others) has stopped functioning on the brick, requiring a service restart on that box along with manually deleting the following PID files from /openils/var/run before resuming services:

-rw-r--r-- 1 opensrf opensrf    5 2010-02-22 06:40 open-ils.permacrud_unix.pid
-rw-r--r-- 1 opensrf opensrf    5 2010-02-22 06:40 open-ils.ingest-unix.pid
-rw-r--r-- 1 opensrf opensrf    5 2010-02-22 06:40 open-ils.actor_unix.pid
-rw-r--r-- 1 opensrf opensrf    5 2010-02-22 06:40 vandelay_unix.pid
-rw-r--r-- 1 opensrf opensrf    5 2010-02-22 06:40 open-ils.search_unix.pid
-rw-r--r-- 1 opensrf opensrf    5 2010-02-22 06:40 open-ils.reporter_unix.pid
-rw-r--r-- 1 opensrf opensrf    5 2010-02-22 06:40 open-ils.circ_unix.pid

8.6. Apache spinning at 100%

Occasionally an Apache process will "spin", consuming 100% of the CPU.  One possible explanation is that the KeepAliveTimeout is set higher than 1.

 

#

# KeepAliveTimeout: Number of seconds to wait for the next request from the

# same client on the same connection.

#

KeepAliveTimeout 15  <-- BAD

KeepAliveTimeout 1  <-- GOOD

 

This is bad, as "all Evergreen communication with clients on the Internet occurs via http.  If every request hogs an Apache child process for 15 seconds, it will run out of child processes real quick like." (quote by Bill Erickson)

 

It is unknown at this time whether there is an error in the log files.

8.7. missing = in XML attribute

Background

This error is related to offline transactions.

When first encountering this error in Evergreen version 1.6.X series, the following portions may present in your error debug:

{
"message":"missing = in XML attribute",
"fileName":"http://[your opac IP]/opac/common/js/JSON_v1.js",
"lineNumber":14,

)@http://[ your opac IP]/xul/rel_1_6_0_0/server/main/JSAN.js:232\nmy_init()@http://208.119.0.200/xul/rel_1_6_0_0/server/admin/offline_manage_xacts.xul:22\nonload(
[
object Event
]
)@http://[ your opac IP]/xul/rel_1_6_0_0/server/admin/offline_manage_xacts.xul:1\n",
"name":"SyntaxError"
}

Recreate The Issue

First, recreate the issue and look for the above parameters. If they present, then the offline interface cannot connect to the database for processing. The reason must now be discovered.

Retain your timestamp of the error, and match it in the ap_error.log with the following:

grep hh:mm ap_error.log

You could also tail the file with:

tail -50 ap_error.log

Look for this:

DBI connect('host=[some IP};dbname=[your DB];port=5432','[DB user]',...) failed: could not connect to server: No route to host,

Verify The Offline DB IP

Open /openils/conf/opensrf.xml in a text editor and scroll to the database settings. Verify the IP of the database, and write it down.

Compare To Offline Settings

In /openils/conf, you should also find a file called offline-config.pl. It might appear as such:

$main::config{base_dir} = '/openils/var/data/offline/';
$main::config{bootstrap} = '/openils/conf/opensrf_core.xml';
$main::config{dsn} = 'dbi:Pg:host=[ wrong IP address for DB];dbname=[your DB name];port=5432';
$main::config{usr} = '[ your DB user]';
$main::config{pw} = '[ your DB user's password]';

If Not Resolved, or Errors Are Extensive

If any of these settings are wrong, correct them, and go forth merry.This will have effect immediately, and will NOT require a restart of any kind.

If this does not resolve the issue, then check on your database server for pg_hba.conf.

It will appear similar to the below:

# TYPE  DATABASE    USER        CIDR-ADDRESS          METHOD

# "local" is for Unix domain socket connections only
local   all         all                               trust
# IPv4 local connections:
host    all         all         127.0.0.1/32          trust
# IPv6 local connections:
host    all         all         ::1/128               trust

# Foo test (Your servers)
host    [dbname]        [dbuser]        [foo's IP]/[foo's netmask]       md5

If your IP is not listed, along with your database name (e.g. evergreen) and your database user (usually evergreen), then add it and save with your favorite text editor.

This may require a database restart.

 

Another instance of this error is when the offline-config.pl file in /opensrf/conf/ is missing or misconfigured. A copy of this file is attached in the Downloads.

8.8. TypeError: n.application_perm is not a function

When the error below is experienced, check the permissions groups for inaccuracies. This can be done in the database by looking at the table permission.grp_tree.

Ensure the following:

  • All items except Users have a logical parent
  • No items link each other as parents, i.e., Item A has a parent of Item B, and vice versa
  • No items link to themselves as parents
To correct a Users group that has a parent listed, enter the following:
update permission.grp_tree set parent = null where id = 1;
To correct an individual group that has the wrong parent:
update permission.grp_tree set parent = [proper parent id] where id = [id of offending group];