module Pxp_reader:Purpose of this module: The Pxp_reader module allows you to exactly specify how external identifiers (SYSTEM or PUBLIC) are mapped to files or channels. This is normally only necessary for advanced configurations, as the functions from_file, from_channel, and from_string in Pxp_types often suffice.sig
..end
There are two ways to use this module. First, you can compose the
desired behaviour by combining several predefined resolver objects
or functions. See the example section at the end of the file.
Second, you can inherit from the classes (or define a resolver class
from scratch). I hope this is seldom necessary as this way is much
more complicated; however it allows you to implement any magic.
exception Not_competent
exception Not_resolvable of exn
type
lexer_source = {
|
lsrc_lexbuf : |
|
lsrc_unicode_lexbuf : |
lsrc_lexbuf
, or lsrc_unicode_lexbuf
!Example of the latter:
Resolver r reads from file:/dir/f1.xml
<tag>some XML text &e; -----> Entity e is bound to "subdir/f2.xml" </tag> Step (1): let r' = "clone of r" Step (2): open file "subdir/f2.xml"
r' must still know the directory of the file r is reading, otherwise it would not be able to resolve "subdir/f2.xml" = "file:/dir/subdir/f2.xml".
Actually, this example can be coded as:
let r = new resolve_as_file in
let lbuf = r # open_in "file:/dir/f1.xml" in
... read from lbuf ...
let r' = r # clone in
let lbuf' = r' # open_in "subdir/f2.xml" in
... read from lbuf' ...
r' # close_in;
... read from lbuf ...
r # close_in;
class type resolver =object
..end
All resolve_read_* classes are now deprecated. The new classes resolve_to_* base on the Netchannels classes as generalization of input streams.
Examples: To read from an in_channel, use:
let obj_channel = new Netchannels.input_channel in_channel in new Pxp_reader.resolve_to_this_obj_channel obj_channel
To read from a string, use:
let obj_channel = new Netchannels.input_string string in new Pxp_reader.resolve_to_this_obj_channel obj_channel
Furthermore, the new classes use the resolver_id record as generalized names for entities. This solves most problems with relative URLs.
The "Anonymous" ID: In previous versions of PXP, a resolver bound to the Anonymous ID matched the Anonymous ID. This is no longer true. The algebra has been changed such that Anonymous never matches, not even itself.
Example: The new resolver let r = new resolve_to_this_obj_channel ~id:Anonymous ch will never accept any ID. In contrast to this, the old, and now deprecated resolver let r' = new resolve_read_this_channel ~id:Anonymous ch accepted the ID Anonymous in previous versions of PXP.
The rationale behind this change is that Anonymous acts now like an "empty set", and not like a concrete element. You can use Private to create as many concrete elements as you want, so there is actually no need for the old behaviour of Anonymous.
Note that even the resolver classes provided for backwards compatibility
implement this change (to limit the confusion). This means that you
might have to change your application to use Private instead of
Anonymous.
class resolve_to_this_obj_channel :?id:Pxp_core_types.ext_id -> ?rid:Pxp_core_types.resolver_id -> ?fixenc:Pxp_core_types.encoding -> ?close:Netchannels.in_obj_channel -> unit -> Netchannels.in_obj_channel ->
resolver
This resolver can only be used once (because the in_obj_channel can only be used once). If it is opened a second time (either in the base object or a clone), it will raise Not_competent.
If you pass the ~fixenc argument, the encoding of the channel is set to the passed value, regardless of any auto-recognition or any XML declaration.
When the resolver is closed, the function passed by the ~close
argument is called. By default, the channel is closed
(i.e. the default is: ~close:(fun ch -> ch # close_in)).
typeaccepted_id =
Netchannels.in_obj_channel * Pxp_core_types.encoding option *
Pxp_core_types.resolver_id option
If None is passed as encoding option, the standard autodetection of the encoding is performed.
If None is passed as resolver_id option, the original ID is taken
unchanged.
class resolve_to_any_obj_channel :?close:Netchannels.in_obj_channel -> unit -> channel_of_id:(Pxp_core_types.resolver_id -> accepted_id) -> unit ->
resolver
When the resolver is closed, the function passed by the ~close
argument is called. By default, the channel is closed
(i.e. the default is: ~close:(fun ch -> ch # close_in)).
class resolve_to_url_obj_channel :?close:Netchannels.in_obj_channel -> unit -> url_of_id:(Pxp_core_types.resolver_id -> Neturl.url) -> base_url_of_id:(Pxp_core_types.resolver_id -> Neturl.url) -> channel_of_url:(Pxp_core_types.resolver_id -> Neturl.url -> accepted_id) -> unit ->
resolver
class resolve_as_file :?file_prefix:[ `Allowed | `Not_recognized | `Required ] -> ?host_prefix:[ `Allowed | `Not_recognized | `Required ] -> ?system_encoding:Pxp_core_types.encoding -> ?map_private_id:Pxp_core_types.private_id -> Neturl.url -> ?open_private_id:Pxp_core_types.private_id ->
Pervasives.in_channel * Pxp_core_types.encoding option -> ?base_url_defaults_to_cwd:bool -> ?not_resolvable_if_not_found:bool -> unit ->resolver
The full form of a file URL is: file://host/path, where 'host' specifies the host system where the file identified 'path' resides. host = "" or host = "localhost" are accepted; other values will raise Not_competent. The standard for file URLs is defined in RFC 1738.
Option ~file_prefix: Specifies how the "file:" prefix of file names is handled: `Not_recognized: The prefix is not recognized. `Allowed: The prefix is allowed but not required (the default). `Required: The prefix is required.
Option ~host_prefix: Specifies how the "//host" phrase of file names is handled: `Not_recognized: The phrase is not recognized. `Allowed: The phrase is allowed but not required (the default). `Required: The phrase is required.
Option ~system_encoding: Specifies the encoding of file names of the local file system. Default: UTF-8.
Options ~map_private_id and ~open_private_id: THESE OPTIONS ARE DEPRECATED! IT IS NOW POSSIBLE TO USE A COMBINED RESOLVER TO ACHIEVE THE SAME EFFECT! - These must always be used together. They specify an exceptional behaviour in case a private ID is to be opened. map_private_id maps the private ID to an URL (or raises Not_competent). However, instead of opening the URL the function open_private_id is called to get an in_channel to read from and to get the character encoding. The URL is taken into account when subsequently relative SYSTEM IDs must be resolved.
Option ~base_url_defaults_to_cwd: If true, relative URLs are interpreted relative to the current working directory at the time the class is instantiated, but only if there is no parent URL, i.e. rid_system_base=None. If false (the default), such URLs cannot be resolved. In general, it is better to set this option to false, and to initialize rid_system_base properly.
Option ~not_resolvable_if_not_found: If true (the default),
"File not found" errors stop the resolution process. If false,
"File not found" is treated as Not_competent
.
val make_file_url : ?system_encoding:Pxp_core_types.encoding ->
?enc:Pxp_core_types.encoding -> string -> Neturl.url
~system_encoding: Specifies the encoding of file names of the local file system. Default: UTF-8. (This argument is necessary to interpret Sys.getcwd() correctly.) ~enc: The encoding of the passed string. Defaults to `Enc_utf8
Note: To get a string representation of the URL, apply
Neturl.string_of_url to the result.
The following classes and functions create resolvers for catalogs
of PUBLIC or SYSTEM identifiers.
class lookup_id :(Pxp_core_types.ext_id * resolver) list ->
resolver
class lookup_id_as_file :?fixenc:Pxp_core_types.encoding -> (Pxp_core_types.ext_id * string) list ->
resolver
Note: SYSTEM IDs are simply compared literally, without making relative IDs absolute. See norm_system_id below for this function.
~fixenc: Overrides the encoding of the file contents. By default, the
standard rule is applied to find out the encoding of the file.
class lookup_id_as_string :?fixenc:Pxp_core_types.encoding -> (Pxp_core_types.ext_id * string) list ->
resolver
Note: SYSTEM IDs are simply compared literally, without making
relative IDs absolute. See norm_system_id below for this function.
class lookup_public_id :(string * resolver) list ->
resolver
The subresolver is invoked if an entity with the corresponding PUBLIC
id is to be opened.
class lookup_public_id_as_file :?fixenc:Pxp_core_types.encoding -> (string * string) list ->
resolver
Note: This class does not enable the resolution of inner IDs of PUBLIC entities by relative SYSTEM names. To get this effect, use the class lookup_id, and feed it with combined Public(pubid,sysid) identifiers. In this case, the entity has both a PUBLIC and a SYSTEM ID, and resolution of inner relative SYSTEM names works.
~fixenc: Overrides the encoding of the file contents. By default, the
standard rule is applied to find out the encoding of the file.
class lookup_public_id_as_string :?fixenc:Pxp_core_types.encoding -> (string * string) list ->
resolver
~fixenc: Overrides the encoding of the strings.
class lookup_system_id :(string * resolver) list ->
resolver
Important note: Two SYSTEM IDs are considered as equal if they are equal in their string representation. (This may not what you want and may cause trouble... However, I currently do not know how to implement a "semantical" comparison logic.)
Note: SYSTEM IDs are simply compared literally, without making
relative IDs absolute. See norm_system_id below for this function.
class lookup_system_id_as_file :?fixenc:Pxp_core_types.encoding -> (string * string) list ->
resolver
Note: SYSTEM IDs are simply compared literally, without making relative IDs absolute. See norm_system_id below for this function.
~fixenc: Overrides the encoding of the file contents. By default, the
standard rule is applied to find out the encoding of the file.
class lookup_system_id_as_string :?fixenc:Pxp_core_types.encoding -> (string * string) list ->
resolver
Note: SYSTEM IDs are simply compared literally, without making relative IDs absolute. See norm_system_id below for this function.
~fixenc: Overrides the encoding of the strings.
class norm_system_id :resolver ->
resolver
class rewrite_system_id :?forward_unmatching_urls:bool -> (string * string) list -> resolver ->
resolver
type
combination_mode =
| |
Public_before_system |
| |
System_before_public |
class combine :?mode:combination_mode -> resolver list ->
resolver
If the entity to open has several names, e.g. a public name and a system name, these names are tried in parallel by default (this is possible in the PXP 1.2 model). For backward compatibility, the ~mode argument allows one to specify a different order:
(1) Try first to open as public identifier, and if that fails, fall back to the system identifier (Public_before_system) (2) Try first to open as system identifier, and if that fails, fall back to the public identifier (System_before_public)
Clones: If the 'clone' method is invoked before 'open_rid', all contained
resolvers are cloned and again combined. If the 'clone' method is
invoked after 'open_rid' (i.e. while the resolver is open), only the
active resolver is cloned.
TODO: The following examples recommend deprecated classes.
EXAMPLES OF RESOLVERS:
let r1 = new resolve_as_file ()
r2; r1
Now a bigger example. The task is to:
lookup_public_id_as_file
[ "P", "file_for_p"; "Q", "file_for_q" ];
lookup_system_id_as_file
[ "http://r/s.dtd", "file_for_this_dtd" ];
new resolve_as_file()
in
(* The recommended way to create the start_id from file names: *)
let start_url =
make_file_url "f.xml" in
let start_id =
System (Neturl.string_of_url url) in
let source = ExtID(start_id, r) in
parse_document_entity ... source ...
----------------------------------------------------------------------
A variation:
lookup_public_id_as_file
[ "P", "file_for_p"; "Q", "file_for_q" ];
lookup_system_id_as_file
[ "http://r/s.dtd", "file_for_this_dtd" ];
resolve_read_any_channel
~channel_of_id: (fun xid ->
if xid = start_id then
open_in_bin "f.xml", None (* you may want to catch Sys_error *)
else raise Not_competent)
();
in
let source = ExtID(start_id, r) in
parse_document_entity ... source ...
----------------------------------------------------------------------
Three further examples can be found in the source of Pxp_yacc (file
pxp_yacc.m2y): the implementations of from_file, from_channel, and
from_string are also applications of the Pxp_reader objects.
DEPRECATED CLASSES
class resolve_read_this_channel :?id:Pxp_core_types.ext_id -> ?fixenc:Pxp_core_types.encoding -> ?close:Pervasives.in_channel -> unit -> Pervasives.in_channel ->
resolver
class resolve_read_any_channel :?close:Pervasives.in_channel -> unit -> channel_of_id:(Pxp_core_types.ext_id ->
Pervasives.in_channel * Pxp_core_types.encoding option) -> unit ->resolver
Note: The function channel_of_id may be called several times to find
out the right ext_id from the current resolver_id. The first result
is taken that is not Not_competent.
resolve_read_any_channel f_open ():
This resolver calls the function f_open to open a new channel for
the passed ext_id. This function must either return the channel and
the encoding, or it must fail with Not_competent.
The function must return None as encoding if the default mechanism to
recognize the encoding should be used. It must return Some e if it is
already known that the encoding of the channel is e.
When the resolver is closed, the function passed by the ~close
argument is called. By default, the channel is closed
(i.e. the default is: ~close:close_in).
class resolve_read_url_channel :?base_url:Neturl.url -> ?close:Pervasives.in_channel -> unit -> url_of_id:(Pxp_core_types.ext_id -> Neturl.url) -> channel_of_url:(Pxp_core_types.ext_id ->
Neturl.url -> Pervasives.in_channel * Pxp_core_types.encoding option) -> unit ->resolver
Note: The function url_of_id may be called several times to find out the right ext_id from the current resolver_id. The first result is taken that is not Not_competent.
Note: The optional argument base_url is ignored. The class uses always
the rid_system_base string to interpret relative URLs.
resolve_read_url_channel url_of_id channel_of_url ():
When this resolver gets an ID to read from, it calls the function ~url_of_id to get the corresponding URL. This URL may be a relative URL; however, a URL scheme must be used which contains a path. The resolver converts the URL to an absolute URL if necessary. The second function, ~channel_of_url, is fed with the absolute URL as input. This function opens the resource to read from, and returns the channel and the encoding of the resource.
Both functions, ~url_of_id and ~channel_of_url, can raise Not_competent to indicate that the object is not able to read from the specified resource. However, there is a difference: A Not_competent from ~url_of_id is left as it is, but a Not_competent from ~channel_of_url is converted to Not_resolvable. So only ~url_of_id decides which URLs are accepted by the resolver and which not.
The function ~channel_of_url must return None as encoding if the default mechanism to recognize the encoding should be used. It must return Some e if it is already known that the encoding of the channel is e.
When the resolver is closed, the function passed by the ~close argument is called. By default, the channel is closed (i.e. the default is: ~close:close_in).
Does not apply to current implementation but to former ones:
Objects of this class contain a base URL relative to which relative
URLs are interpreted. When creating a new object, you can specify
the base URL by passing it as ~base_url argument. When an existing
object is cloned, the base URL of the clone is the URL of the original
object.
Note that the term "base URL" has a strict definition in RFC 1808.
class resolve_read_this_string :?id:Pxp_core_types.ext_id -> ?fixenc:Pxp_core_types.encoding -> string ->
resolver
class resolve_read_any_string :string_of_id:(Pxp_core_types.ext_id -> string * Pxp_core_types.encoding option) -> unit ->
resolver
val lookup_public_id_as_file : ?fixenc:Pxp_core_types.encoding ->
(string * string) list -> resolver
val lookup_public_id_as_string : ?fixenc:Pxp_core_types.encoding ->
(string * string) list -> resolver
val lookup_system_id_as_file : ?fixenc:Pxp_core_types.encoding ->
(string * string) list -> resolver
val lookup_system_id_as_string : ?fixenc:Pxp_core_types.encoding ->
(string * string) list -> resolver