[an error occurred while processing this directive]
Glimpse Search Form Parameters

This page describes in detail the parameters and options for a Glimpse search form. All of these parameters are used in every Glimpse search (if you omit an optional parameter, its default value is used).

What's a Parameter?

A parameter, both in computer software and in every day language, simply means an "option," or more specifically something which can be varied in order to vary the result. You can control the behavior of searches of your index(es) by tailoring the settings of these parameters (options) to your requirements.

How to Set a Search Parameter

Each search form parameter is set in one of the following four ways (required parameters cannot be set by the first method):

  1. By "default" - By not specifying a parameter, you are implicitly specifying that the default value for that parameter be used. For example, if you do not specify a value for the glimpse_case_sensitive parameter, all searches using your form will automatically be case insensitive, because that is the default for that parameter. The default setting of each parameter is described below. All parameters except the first 3 listed below under "Essential Parameters" are optional and thus have a default value.
  2. Hidden field - If you want to override the default and want to use the same value for every search, set the parameter to a fixed value with a hidden field. For example if you wanted every search to be limited to a maximum of 50 matches without exception, you'd code the following hidden field in your form:
    
    <INPUT TYPE=hidden NAME=glimpse_max VALUE=50>
    
  3. By invoking the search from a form processor application - By setting the search parameters in the form processor configuration file.
  4. Let the user decide. In some cases, you may wish to give the users of your search form the ability to choose the setting for a parameter. For example, the following HTML in your search form would create a checkbox on your form which the user could use to turn case sensitivity on or off for any given search:
    
    Case sensitive search:
    
    <INPUT TYPE=checkbox NAME=glimpse_case_sensitive VALUE=on>
A Few More Notes on Setting Parameters

Boolean parameters

Boolean parameters (not to be confused with boolean searches) are parameters which can only toggle on or off. Typically, boolean values are referred to as "TRUE" or "FALSE," but since a boolean value is just an on/off value, any two words will work (the choice is arbitrary).

The WebCom Glimpse facility uses "on" and "off" to represent boolean values. What this means is: for all search parameters which are boolean values (yes or no options), set the parameter to "on" if you want to use that option. If you do not want to use the option, set it to blank or anything other than "on" (such as "off").

Parameter values are not case sensitive

For instance, you can turn case sensitivity for searches on by setting the glimpse_case_sensitive parameter to "ON" or "on," "On", or even "oN."

Essential Search Parameters

The following search parameters are essential (he first three are required and therefore have no default value. The glimpse_directory is optional but is included under "essential" parameters, since although the default is most frequently used, it is essential that you consider the parameter carefully; the button, while not really a parameter, is listed here to remind you that it's required:

Commonly Used Optional Parameters

glimpse_case_sensitive

Case sensitivity means whether or not Glimpse makes a distinction between upper and lower case letters in checking for a match. In a case sensitive search, "Cat", "cat", and "CAT" are all considered different, and a search for "cat" would only match the second version. In a case insensitive search, "cat" would match all three words.

The default value if you do not set this parameter is that the search is not case sensitive. If you do want to provide for case sensitive searching, you'll need to set this parameter to the value "on" using one of the second two methods for setting a parameter.

glimpse_show_head

This parameter is one of the parameters which controls the format of the result page (the list of matches the user sees after submitting a search).

If this parameter is "on," then the "Header" information for each matching file will appear on the result page. Please see the documentation on the result page for more information on the contents of a page "header" (in most cases the header is the web page TITLE, (which will be hotlinked to the page itself, and the description from the description META tag in the HTML Header of the web page).

In most cases, you'll want this value to be "on". If you don't want the header to show on the result page, you'll need to set this parameter to the value "off" using one of the second two methods for setting a parameter.

glimpse_show_lines

This parameter is one of the parameters which controls the format of the result page (the page of matches the user sees after submitting a search).

If this parameter is "on", each matching line in each matching file is displayed (with the matching word(s) in bold).

If you the matching lines in a file to show on the result page (normally you'd just show the page header), you'll need to set this parameter to the value "on" using one of the second two methods for setting a parameter.

NOTE: glimpse_show_lines and glimpse_count_matches are mutually exclusive parameters; you may set either one to "on" but not both at the same time. This is due to a limitation in Glimpse; it will output the number matching lines in a file, or the lines themselves, but it won't do both. If you set both glimpse_show_lines and glimpse_count_matches to "on", the search will return an error message.

WARNING: Setting this parameter to "on" forces Glimpse to scan the entire contents of every matched file (whereas if it's off, it need only read the file header). For this reason, setting this parameter to "on" will degrade the speed of searches.

glimpse_count_matches

This parameter is one of the parameters which controls the format of the result page (the page of matches the user sees after submitting a search).

If this parameter is "on", then for each matching file the number of matches is displayed in parenthesis after the page title. Since the match count is displayed as part of the header, this option only works if you also have the glimpse_show_head parameter on (it's on by default).

NOTE: glimpse_show_lines and glimpse_count_matches are mutually exclusive parameters; you may set either one to "on" but not both at the same time. This is due to a limitation in Glimpse; it will output the number matching lines in a file, or the lines themselves, but it won't do both. If you set both glimpse_show_lines and glimpse_count_matches to "on", the search will return an error message.

Because this parameter is mutually exclusive with the glimpse_show_lines parameter, and the default value for glimpse_show_lines is "on", the default value for glimpse_show_matches is of necessity "off".

WARNING: Setting this parameter to "on" forces Glimpse to scan the entire contents of every matched file (whereas if it's off, it need only read the file header). For this reason, setting this parameter to "on" will degrade the speed of searches.

glimpse_max

This parameter allows you or the user to set a maximum number of matches. This can be useful if your site is large and could potentially return hundreds of matches if the search terms are too general.

This parameter must be set to a number greater than or equal to 1. If you set this parameter to a number, Glimpse will suspend output of matches after it has displayed that many matches, along with a message informing the user that the search was suspended at that number, and that there may be additional matches which were not shown.

The default is that this parameter has no value, meaning that Glimpse will return the entire list of matches no matter how large. You can explicitly set the parameter to unlimited by setting it to blank or any value other than a positive number (such as "off").

glimpse_header and glimpse_footer

Please Note: The glimpse_header and glimpse_footer parameters have changed! As of April 14th 1998, WebCom has changed the functionality of these two Glimpse parameters. In brief, these parameters are now obsolete, and Glimpse now automatically looks for files called glimpse_header.html and glimpse_footer.html in the directory that you have designated as the glimpse_directory. For more details, please read our Headers and Footers documentation.

glimpse_domain

This parameter allows you to specify the root domain name to be used in the result page (the list of matches the user sees after a search). All links to search matches will contain the domain specified.

If you do not set this parameter, the URLs in the search result will all start with http://webcom.com/~webcom. If you have registered your own domain name, you may want to set this parameter to www.yourdomain.com (omit the http:// prefix), so that in the search result, the URLs will reflect your domain name.

This is a purely cosmetic feature. Your search form will work fine with the default value, webcom.com/~webcom, whether or not you have a domain name registered with WebCom. Setting this parameter to your own domain name only works properly if your domain name is hosted at WebCom, or your domain name is hosted elsewhere but the sub domain (e.g. www.yourdomain.com) is configured to resolve to the WebCom web server.

glimpse_no_bold

This parameter controls the bolding of search keywords in the search result.

Typically you will set glimpse_no_bold to on when you use glimpse_show_lines, causing the text to appear normal. The default is off, causing the resulting search to contain the search terms in bold.

Less Commonly Used Optional Parameters

glimpse_type

Whole Word Search (default, or glimpse_type=w)

The default search type is a boolean whole word search. Since this is the default search type, you need not bother setting this parameter if this is the type of search you'd like to use. This is a very fast and versatile word search, and will be more than adequate for virtually all search needs.

A whole-word search means that the search only finds whole words matching the input words. For example, a search for "car" would find all files containing the word "car", but not files containing "cartoon" or "NASCAR".

This search type also supports boolean searches.

A boolean search permits searching for several words, and specifying AND and/or OR between each word. Glimpse uses semicolon (;) for AND, comma (,) for OR, and curly brackets ({}) for parentheses to override boolean precedence. For example, "find all files with web and page" would be "web;page", and "find all files with web OR page" would be "web,page." "Find all files with web and either page or master" would be expressed as "web;{page,master}." Please follow the above link for boolean searches to get more detailed info.

(If you read the glimpse manual pages at the University of Arizona, this type of search simply uses the -w switch on the Glimpse command).

Glimpse can also do other search types and these are available for your use in case you have a special requirement.

Simple Substring Search (glimpse_type=ss)

A simple substring search differs from the default search type in that the input patterns will match partial words as well as whole words. A search for "car" would match "car", "cartoon", "NASCAR", etc.

This type of search can at times be significantly slower than the default search mode because it often forces Glimpse to scan both the Glimpse index as well as the full text of all potential matches.

This search type, in a nutshell, searches for exactly what the user types in the search field. For example, if the user typed "web publishing" in the search field, Glimpse would look for exactly that sequence of letters in the file (web (space) publishing, not for all files with either "web" or "publishing," or "web" and "publishing").

Regular expressions are not supported by this type of search either; a search for "Web#" will find all occurrences of Web followed by a pound sign.

Boolean searches are not supported by this search type, since a search for "web;publishing" looks for exactly that sequence of characters, semicolon included.

Note: If you read the Glimpse manual pages at the University of Arizona, the technical difference between this search type and the default type is that this type does not set the -w switch, but does set the -k switch.

General Search (glimpse_type = g)

This search differs from the default type in that it will match partial words rather than whole words only.

It is therefore more general than the default search (in fact, it is the most general type of the three). However, it may not be quite as useful as the default (whole word) search, because:

  • This search type tends to slow down the search (More often, Glimpse must scan the whole text of the file to confirm potential matches, whereas a whole word search allows Glimpse to stay within the index most of the time, only going to the file itself for certain matches if matching lines or match counts are requested.).

  • This search may tend to dilute the precision of searches. For instance, if the user were to search for "import" this search type may return a large number of files containing the word "important", and may even obscure the actual matches from the user (perhaps there was one full word occurrence of "import" but it was not noticed in the large search result).

    This search type supports boolean searches.

    If you want to search for a special character which would ordinarily be interpreted by Glimpse as having a special meaning (such as a semicolon or pound sign), you can prevent Glimpse from treating the character specially by preceding it with a backslash "\". E.g., to search for pound signs, use "\#" (WARNING: Glimpse is optimized for searching for natural language (ordinary) words. Searching for symbols tends to slow Glimpse down, sometimes significantly.)

    In short, this search has the widest scope but therefore tends to be slower and overly general. We recommend the default search type for general text searching applications.

    Note: If you read the Glimpse manual pages at the University of Arizona, this search type invokes Glimpse with neither the -w nor the -k switches.

    glimpse_max_errors

    Glimpse supports the ability to find near word matches as well as exact word matches. Glimpse can compare your search pattern with the words in your index and quantify the difference between the two in terms of number of spelling errors.

    By default, Glimpse only returns exact matches. However, by setting this parameter to a non-zero value, Glimpse will allow words to match that have that many spelling errors. For instance, if you were to set this parameter to 1, then the search would return not only all files containing matches, but also all files containing matches with at most one spelling error (either in the search word or the word in the file).

    For example, the words "hippopotamus", "hippopotumus", "hipopotamus" are all treated by Glimpse as the same word, provided Glimpse is told to permit one spelling error.

    By default, this parameter is zero. You may set it to any number from 0 to 8.

    Issues to consider in using glimpse_max_errors

    1. Allowing misspellings tends to slow down searches since Glimpse must essentially scan the entire index checking for near matches within the specified tolerance. However, provided you use this parameter with a whole word search, it should still be fast enough since whole word searches almost always allow Glimpse to determine the result set from the index alone.
    2. Setting this parameter, especially to 2 or greater, can dilute the accuracy of searches by returning too many near-matches. If set to high, the matches which would have been of interest to the searcher could potentially be obscured by the sheer flood of near matches returned.

    glimpse_fileset

    This parameter allows you to select the fileset among which to search. Even if your Glimpse index contains hundreds of non-html text files, you could specify that a particular search be restricted to html files only by setting this parameter to "*.html".

    This parameter expects a regular expression defining the set of files to be searched. It does not support the 'shell wildcard'. For example, to restrict the search to all files ending with a .html extension, you'd need to set glimpse_fileset to ".*\.html", or using the Glimpse '#' extension to regular expression syntax (in glimpse, # is the same as .* in standard regexp syntax), "#\.html". "*.html" will not work since it is not a regular expression.

    This parameter can be useful to restrict searches to specific groups of files, such as all those in a certain directory, as an alternative to creating a separate index for each directory you want to be able to search individually, thus conserving disk space. Without this feature, if you wanted to have n index of everything but also allow searches of specific subdirectories, you'd have to create one large index in the top directory, as well asan individual index in each of the subdirectories, thus duplicating index information.

    glimpse_not_matched

    This parameter contains the message you want returned to the browser if no matches are found during the search. The most useful application of a custom message is for foreign languages. You can create multiple search forms; one for each language your site supports.

    Here is an example of how to use the parameter:

    <INPUT TYPE=HIDDEN NAME=glimpse_not_matched VALUE="Please try your search again, no matches were found">
    

    Rarely Used/Obscure Search Parameters

    glimpse_boolean_line

    This parameter is only relevant if the type of search you select with the glimpse_type parameter supports boolean searches (the default search type does), and if the user searches for more than one word and uses a boolean operator (comma or semicolon) to conjoin the words.

    By default, the scope of a boolean search is the file. What this means is that if you search for "web AND page" ("web;page"), you are asking glimpse to find all files which contain both the word "web" and the word "page" (the same is true for an OR search, or a complex search with ORs and ANDs (commas and semicolons).

    By setting the glimpse_boolean_line parameter to "on, you can change the scope of a boolean search to the line level. What this means is that (to reuse the previous example), to search for "web;page" would be to ask for all files having at least one line which contains both words. In other words, instead of both words having to exist anywhere in the file in order for it to match, they must both exist on the same line. The boolean scope is reduced from the file level to the line level.

    One situation in which this feature is useful is if you're using Glimpse to create a keyword index or some other specialized form of index in which each line of the file represents a single entry in the index. For example if you wanted a keyword index of products, you could have one file with one line for each product, with each line containing the keywords for that product and a link to that product's web page. In a search of such an index, you'd want to restrict the scope of a boolean search to each line of the file.

    glimpse_rec_delim

    Glimpse has to view a text file as a set of lines in either of the following two situations:

    1. In case you ask to see each matching line in the file.
    2. In case you set the glimpse_boolean_line parameter on to restrict the scope of a boolean search to the line level.

    By default, as you'd expect, Glimpse divides the file into lines at each "newline" character found in the file. However, you can change the line break character to be anything you want.

    One situation in which this feature is useful is if you're using Glimpse to create a keyword index or some other specialized form of index in which each line of the file represents a single entry in the index. For example if you had a keyword index for each product, you could have one file with one line for each product, with each line containing the keywords for that product and a link to that product's web page.

    Suppose instead of just a link to each product, you wanted to include a thumbnail gif and brief description of each product which qualified in the search. As you may know, HTML does not pay any attention to line breaks (since the browser reformats everything anyway), so you could include the link to the thumbnail .gif and the text of the description all on one line so that when that line was provided as a match, the thumbnail .gif and description would appear.

    However, you would probably find creating and editing this file to be quite cumbersome, having to put everything on one extremely long line. Using glimpse_rec_delim (which stands for record delimiter; record and line are used interchangeably here), you could change the record delimiter/ line break to be, say "<!--X-->" instead of the newline character.

    This way, you could set up the product information in multiple lines, and separate each product with "<!--X-->". Glimpse would then treat everything between each "<!--X-->" as one line for purposes of checking for a line-level match or outputting a matching line (and thus when a matching line were output, everything between the "<!--X-->" delimiters would be output). The record delimiter itself in this example is an HTML comment, so it does not actually appear in either the output of a Glimpse search or if the page were loaded directly in its entirety.

    NOTE: If your record delimiter value starts with "!--", the Glimpse search facility assumes you're using an HTML comment for your delimiter and automatically surrounds your record delimiter with angle brackets to determine the actual record delimiter used in the search. So in the above example, you'd set glimpse_rec_delim to "!--X--", resulting in a record delimiter of <!--X-->. This is done because most browsers get confused when angle brackets are coded as a value in an HTML form, since angle brackets are used to delimit HTML tags. There is no other way to use angle brackets as a VALUE in a form field.

    The default for this parameter, as mentioned, is the newline. To explicitly set the delimiter to newline, set this parameter to "$" (this is the regular expression "meta" character meaning "end of line"; for some reason, Glimpse uses this instead of "\n" to signify line breaks for this parameter only).

    A record delimiter is limited to 8 characters (must be 8 or fewer characters).

    [an error occurred while processing this directive]