[6] | 1 | \documentclass[11pt]{article} |
---|
| 2 | %\decimalpoint |
---|
| 3 | \tolerance 10000 |
---|
| 4 | \textheight 24cm |
---|
| 5 | \textwidth 16cm |
---|
| 6 | \oddsidemargin 1mm |
---|
| 7 | \topmargin -20mm |
---|
| 8 | \parindent 0mm |
---|
| 9 | \begin{document}\title{xmlf90: A parser for XML in Fortran90} |
---|
| 10 | \author{Alberto Garc\'{\i}a \\ |
---|
| 11 | Departamento de F\'{\i}sica de la Materia Condensada \\ |
---|
| 12 | Facultad de Ciencia y Tecnolog\'{\i}a \\ |
---|
| 13 | Universidad del Pa\'{\i}s Vasco\\ |
---|
| 14 | Apartado 644 , 48080 Bilbao, Spain\\ |
---|
| 15 | http://lcdx00.wm.lc.ehu.es/ag/xml/} |
---|
| 16 | \date{30 January 2004 --- xmlf90 Version 1.1} |
---|
| 17 | |
---|
| 18 | \maketitle\section{Introduction} |
---|
| 19 | |
---|
| 20 | {\bf NOTE: This version of the User Guide and Tutorial does not |
---|
| 21 | cover either the WXML printing library or the new DOM API |
---|
| 22 | conceived by Jon Wakelin. See the html reference material and the |
---|
| 23 | relevant example subdirectories.} |
---|
| 24 | \bigskip |
---|
| 25 | |
---|
| 26 | This tutorial documents the user interface of \texttt{xmlf90}, a |
---|
| 27 | native Fortran90 XML parser. The parser was designed to be a useful |
---|
| 28 | tool in the extraction and analysis of data in the context of |
---|
| 29 | scientific computing, and thus the priorities were efficiency and the |
---|
| 30 | ability to deal with very large XML files while maintaining a small |
---|
| 31 | memory footprint. There are two programming interfaces. The first is |
---|
| 32 | based on the very successful SAX (Simple API for XML) model: the |
---|
| 33 | parser calls routines provided by the user to handle certain events, |
---|
| 34 | such as the encounter of the beginning of an element, or the end of an |
---|
| 35 | element, or the reading of character data. The other is based on the |
---|
| 36 | XPATH standard. Only a very limited set of the full XPATH |
---|
| 37 | specification is offered, but it is already quite useful. |
---|
| 38 | |
---|
| 39 | Some familiarity of XML is assumed. Apart from the examples discussed |
---|
| 40 | in this tutorial (chosen for their simplicity), the interested reader |
---|
| 41 | can refer to the \texttt{Examples/} directory in the \texttt{xmlf90} |
---|
| 42 | distribution. |
---|
| 43 | |
---|
| 44 | |
---|
| 45 | |
---|
| 46 | \section{The SAX interface} |
---|
| 47 | \subsection{A simple example} |
---|
| 48 | |
---|
| 49 | To illustrate the working of the SAX interface, consider the following |
---|
| 50 | XML snippet |
---|
| 51 | |
---|
| 52 | \begin{verbatim} |
---|
| 53 | <item id="003"> |
---|
| 54 | <description>Washing machine</description> |
---|
| 55 | <price currency="euro">1500.00</price> |
---|
| 56 | </item> |
---|
| 57 | \end{verbatim} |
---|
| 58 | % |
---|
| 59 | When the parser processes this snippet, it carries out the sequence of calls: |
---|
| 60 | |
---|
| 61 | \begin{enumerate} |
---|
| 62 | \item call to \texttt{begin\_element\_handler} with name="item" and |
---|
| 63 | attributes=(Dictionary with the pair (id,003)) |
---|
| 64 | \item call to \texttt{begin\_element\_handler} with name="description" and an |
---|
| 65 | empty attribute dictionary. |
---|
| 66 | \item call to \texttt{pcdata\_chunk\_handler} with pcdata="Washing machine" |
---|
| 67 | \item call to \texttt{end\_element\_handler} with name="description" |
---|
| 68 | \item call to \texttt{begin\_element\_handler} with name="price" and |
---|
| 69 | attributes=(Dictionary with the pair (currency,euro)) |
---|
| 70 | \item call to \texttt{pcdata\_chunk\_handler} with pcdata="1500.00" |
---|
| 71 | \item call to \texttt{end\_element\_handler} with name="price" |
---|
| 72 | \item call to \texttt{end\_element\_handler} with name="item" |
---|
| 73 | \end{enumerate} |
---|
| 74 | |
---|
| 75 | The handler routines are written by the user and passed to the parser |
---|
| 76 | as procedure arguments. A simple program that parses the above XML |
---|
| 77 | fragment (assuming it resides in file \textsl{inventory.xml}) and |
---|
| 78 | prints out the names of the elements and any \textsl{id} attributes as |
---|
| 79 | they are found, is: |
---|
| 80 | |
---|
| 81 | \begin{verbatim} |
---|
| 82 | program simple |
---|
| 83 | use flib_sax |
---|
| 84 | |
---|
| 85 | type(xml_t) :: fxml ! XML file object (opaque) |
---|
| 86 | integer :: iostat ! Return code (0 if OK) |
---|
| 87 | |
---|
| 88 | call open_xmlfile("inventory.xml",fxml,iostat) |
---|
| 89 | if (iostat /= 0) stop "cannot open xml file" |
---|
| 90 | |
---|
| 91 | call xml_parse(fxml, begin_element_handler=begin_element_print) |
---|
| 92 | |
---|
| 93 | contains !---------------- handler subroutine follows |
---|
| 94 | |
---|
| 95 | subroutine begin_element_print(name,attributes) |
---|
| 96 | character(len=*), intent(in) :: name |
---|
| 97 | type(dictionary_t), intent(in) :: attributes |
---|
| 98 | |
---|
| 99 | character(len=3) :: id |
---|
| 100 | integer :: status |
---|
| 101 | |
---|
| 102 | print *, "Start of element: ", name |
---|
| 103 | if (has_key(attributes,"id")) then |
---|
| 104 | call get_value(attributes,"id",id,status) |
---|
| 105 | print *, " Id attribute: ", id |
---|
| 106 | endif |
---|
| 107 | end subroutine begin_element_print |
---|
| 108 | |
---|
| 109 | end program simple |
---|
| 110 | \end{verbatim} |
---|
| 111 | % |
---|
| 112 | To access the XML parsing functionality, the user only needs to \texttt{use} |
---|
| 113 | the module \texttt{flib\_sax}, open the XML file, and call the main routine |
---|
| 114 | \texttt{xml\_parse}, providing it with the appropriate event handlers. |
---|
| 115 | |
---|
| 116 | The subroutine interfaces are: |
---|
| 117 | |
---|
| 118 | \begin{verbatim} |
---|
| 119 | subroutine open_xmlfile(fname,fxml,iostat) |
---|
| 120 | character(len=*), intent(in) :: fname ! File name |
---|
| 121 | type(xml_t), intent(out) :: fxml ! XML file object (opaque) |
---|
| 122 | integer, intent(out ) :: iostat ! Return code (0 if OK) |
---|
| 123 | |
---|
| 124 | |
---|
| 125 | subroutine xml_parse(fxml, & |
---|
| 126 | begin_element_handler, & |
---|
| 127 | end_element_handler, & |
---|
| 128 | pcdata_chunk_handler .... |
---|
| 129 | .... MORE OPTIONAL HANDLERS ) |
---|
| 130 | |
---|
| 131 | \end{verbatim} |
---|
| 132 | |
---|
| 133 | The handlers are OPTIONAL arguments (in the above example we just |
---|
| 134 | specify \texttt{begin\_element\_handler}). If no handlers are given, |
---|
| 135 | nothing useful will happen, except that any errors are detected and |
---|
| 136 | reported. The interfaces for the most useful handlers are: |
---|
| 137 | |
---|
| 138 | \begin{verbatim} |
---|
| 139 | subroutine begin_element_handler(name,attributes) |
---|
| 140 | character(len=*), intent(in) :: name |
---|
| 141 | type(dictionary_t), intent(in) :: attributes |
---|
| 142 | end subroutine begin_element_handler |
---|
| 143 | |
---|
| 144 | subroutine end_element_handler(name) |
---|
| 145 | character(len=*), intent(in) :: name |
---|
| 146 | end subroutine end_element_handler |
---|
| 147 | |
---|
| 148 | subroutine pcdata_chunk_handler(chunk) |
---|
| 149 | character(len=*), intent(in) :: chunk |
---|
| 150 | end subroutine pcdata_chunk_handler |
---|
| 151 | \end{verbatim} |
---|
| 152 | |
---|
| 153 | The attribute information in an element tag is represented as a |
---|
| 154 | dictionary of name/value pairs, held in a \texttt{dictionary\_t} |
---|
| 155 | abstract type. The information in it can be accessed through a set of |
---|
| 156 | dictionary methods such as \texttt{has\_key} and \texttt{get\_value} |
---|
| 157 | (full interfaces to be found in Sect.~\ref{sec:reference}). |
---|
| 158 | |
---|
| 159 | \subsection{Monitoring the sequence of events} |
---|
| 160 | The above example is too simple and not very useful if what we want is |
---|
| 161 | to extract information in a coherent manner. For example, assume we |
---|
| 162 | have a more complete inventory of appliances such as |
---|
| 163 | % |
---|
| 164 | \begin{verbatim} |
---|
| 165 | <inventory> |
---|
| 166 | <item id="003"> |
---|
| 167 | <description>Washing machine</description> |
---|
| 168 | <price currency="euro">1500.00</price> |
---|
| 169 | </item> |
---|
| 170 | <item id="007"> |
---|
| 171 | <description>Microwave oven</description> |
---|
| 172 | <price currency="euro">300.00</price> |
---|
| 173 | </item> |
---|
| 174 | <item id="011"> |
---|
| 175 | <description>Dishwasher</description> |
---|
| 176 | <price currency="swedish crown">10000.00</price> |
---|
| 177 | </item> |
---|
| 178 | </inventory> |
---|
| 179 | \end{verbatim} |
---|
| 180 | % |
---|
| 181 | and we want to print the items with their prices in the form: |
---|
| 182 | % |
---|
| 183 | \begin{verbatim} |
---|
| 184 | 003 Washing machine : 1500.00 euro |
---|
| 185 | 007 Microwave oven : 300.00 euro |
---|
| 186 | 011 Dishwasher : 10000.00 swedish crown |
---|
| 187 | \end{verbatim} |
---|
| 188 | |
---|
| 189 | We begin by writing the following module |
---|
| 190 | |
---|
| 191 | \begin{verbatim} |
---|
| 192 | module m_handlers |
---|
| 193 | use flib_sax |
---|
| 194 | private |
---|
| 195 | public :: begin_element, end_element, pcdata_chunk |
---|
| 196 | ! |
---|
| 197 | logical, private :: in_item, in_description, in_price |
---|
| 198 | character(len=40), private :: what, price, currency, id |
---|
| 199 | ! |
---|
| 200 | contains !----------------------------------------- |
---|
| 201 | ! |
---|
| 202 | subroutine begin_element(name,attributes) |
---|
| 203 | character(len=*), intent(in) :: name |
---|
| 204 | type(dictionary_t), intent(in) :: attributes |
---|
| 205 | |
---|
| 206 | integer :: status |
---|
| 207 | |
---|
| 208 | select case(name) |
---|
| 209 | case("item") |
---|
| 210 | in_item = .true. |
---|
| 211 | call get_value(attributes,"id",id,status) |
---|
| 212 | |
---|
| 213 | case("description") |
---|
| 214 | in_description = .true. |
---|
| 215 | |
---|
| 216 | case("price") |
---|
| 217 | in_price = .true. |
---|
| 218 | call get_value(attributes,"currency",currency,status) |
---|
| 219 | |
---|
| 220 | end select |
---|
| 221 | |
---|
| 222 | end subroutine begin_element |
---|
| 223 | !--------------------------------------------------------------- |
---|
| 224 | subroutine pcdata_chunk_handler(chunk) |
---|
| 225 | character(len=*), intent(in) :: chunk |
---|
| 226 | |
---|
| 227 | if (in_description) what = chunk |
---|
| 228 | if (in_price) price = chunk |
---|
| 229 | |
---|
| 230 | end subroutine pcdata_chunk_handler |
---|
| 231 | !--------------------------------------------------------------- |
---|
| 232 | subroutine end_element(name) |
---|
| 233 | character(len=*), intent(in) :: name |
---|
| 234 | |
---|
| 235 | select case(name) |
---|
| 236 | case("item") |
---|
| 237 | in_item = .false. |
---|
| 238 | write(unit=*,fmt="(5(a,1x))") trim(id), trim(what), ":", & |
---|
| 239 | trim(price), trim(currency) |
---|
| 240 | |
---|
| 241 | case("description") |
---|
| 242 | in_description = .false. |
---|
| 243 | |
---|
| 244 | case("price") |
---|
| 245 | in_price = .false. |
---|
| 246 | |
---|
| 247 | end select |
---|
| 248 | |
---|
| 249 | end subroutine end_element |
---|
| 250 | !--------------------------------------------------------------- |
---|
| 251 | end module m_handlers |
---|
| 252 | \end{verbatim} |
---|
| 253 | % |
---|
| 254 | PCDATA chunks are passed back as simple fortran character variables, |
---|
| 255 | and we assign them to \texttt{what} or \texttt{price} depending on the |
---|
| 256 | context, which we monitor through the logical variables |
---|
| 257 | \texttt{in\_description, in\_price}, updated as we enter and leave |
---|
| 258 | different elements. (The variable \texttt{in\_item} is not strictly |
---|
| 259 | necessary.) |
---|
| 260 | |
---|
| 261 | The program to parse the file just needs to use the functionality in |
---|
| 262 | the module \texttt{m\_handlers}: |
---|
| 263 | % |
---|
| 264 | \begin{verbatim} |
---|
| 265 | program inventory |
---|
| 266 | use flib_sax |
---|
| 267 | use m_handlers |
---|
| 268 | |
---|
| 269 | type(xml_t) :: fxml ! XML file object (opaque) |
---|
| 270 | integer :: iostat |
---|
| 271 | |
---|
| 272 | call open_xmlfile("inventory.xml",fxml,iostat) |
---|
| 273 | if (iostat /= 0) stop "cannot open xml file" |
---|
| 274 | |
---|
| 275 | call xml_parse(fxml, begin_element_handler=begin_element, & |
---|
| 276 | end_element_handler=end_element, & |
---|
| 277 | pcdata_chunk_handler=pcdata_chunk ) |
---|
| 278 | |
---|
| 279 | end program inventory |
---|
| 280 | |
---|
| 281 | \end{verbatim} |
---|
| 282 | % |
---|
| 283 | \subsubsection{Exercises} |
---|
| 284 | \begin{enumerate} |
---|
| 285 | \item Code the above fortran files and the XML file in your |
---|
| 286 | computer. Compile and run the program and check that the output is |
---|
| 287 | correct. (Compilation instructions are provided in |
---|
| 288 | Sect.~\ref{sec:compiling}). |
---|
| 289 | \item Edit the XML file and remove one of the \texttt{</item>} |
---|
| 290 | lines. What happens? This is an example of a \textsl{mal-formed} XML |
---|
| 291 | file. The parser can detect it and complain about it. |
---|
| 292 | \item Edit the XML file and remove the \texttt{currency} attribute |
---|
| 293 | from one of the elements. What happens? In this case, the parser |
---|
| 294 | cannot detect the missing attribute (it is not a \textsl{validating |
---|
| 295 | parser}). However, it could be possible for the user to detect early |
---|
| 296 | that something is wrong by checking the value of the \texttt{status} |
---|
| 297 | variable after the call to \texttt{get\_value}. |
---|
| 298 | \item Modify the program to print the prices in euros (1 euro buys |
---|
| 299 | approximately 9.2 swedish crowns). |
---|
| 300 | \end{enumerate} |
---|
| 301 | |
---|
| 302 | \subsection{Other tags and their handlers} |
---|
| 303 | |
---|
| 304 | The parser can also process comments, XML declarations (formally known |
---|
| 305 | as ``processing instructions"), and SGML declarations, although the |
---|
| 306 | latter two are not acted upon in any way (in particular, no attempt at |
---|
| 307 | validation of the XML document is done). |
---|
| 308 | |
---|
| 309 | \begin{itemize} |
---|
| 310 | |
---|
| 311 | \item |
---|
| 312 | An \textbf{empty element} tag of the form |
---|
| 313 | % |
---|
| 314 | \begin{verbatim} |
---|
| 315 | <name att="value"... /> |
---|
| 316 | \end{verbatim} |
---|
| 317 | % |
---|
| 318 | can be handled as successive calls to \texttt{begin\_element\_handler} |
---|
| 319 | and \texttt{end\_element\_handler}. However, if the optional handler |
---|
| 320 | \texttt{empty\_element\_handler} is present, it is called instead. Its |
---|
| 321 | interface is exactly the same as that of |
---|
| 322 | \texttt{begin\_element\_handler}: |
---|
| 323 | % |
---|
| 324 | \begin{verbatim} |
---|
| 325 | subroutine empty_element_handler(name,attributes) |
---|
| 326 | character(len=*), intent(in) :: name |
---|
| 327 | type(dictionary_t), intent(in) :: attributes |
---|
| 328 | end subroutine empty_element_handler |
---|
| 329 | \end{verbatim} |
---|
| 330 | % |
---|
| 331 | \item |
---|
| 332 | \textbf{Comments} are sections of the XML file contained between the markup |
---|
| 333 | \texttt{<!{-}-} and \texttt{{-}->}, |
---|
| 334 | and are handled by the optional argument \texttt{comment\_handler} |
---|
| 335 | % |
---|
| 336 | \begin{verbatim} |
---|
| 337 | subroutine comment_handler(comment) |
---|
| 338 | character(len=*), intent(in) :: comment |
---|
| 339 | end subroutine comment_handler |
---|
| 340 | \end{verbatim} |
---|
| 341 | % |
---|
| 342 | \item |
---|
| 343 | \textbf{XML declarations} can be processed |
---|
| 344 | in the same way as elements, with the ``target" being the element name, etc. |
---|
| 345 | For example, in |
---|
| 346 | % |
---|
| 347 | \begin{verbatim} |
---|
| 348 | <?xml version="1.0"?> |
---|
| 349 | \end{verbatim} |
---|
| 350 | % |
---|
| 351 | \textsl{xml} would be the ``element name", \textsl{version} an |
---|
| 352 | attribute name, and \textsl{1.0} its value. The optional handler |
---|
| 353 | interface is: |
---|
| 354 | % |
---|
| 355 | \begin{verbatim} |
---|
| 356 | subroutine xml_declaration_handler(name,attributes) |
---|
| 357 | character(len=*), intent(in) :: name |
---|
| 358 | type(dictionary_t), intent(in) :: attributes |
---|
| 359 | end subroutine xml_declaration_handler |
---|
| 360 | \end{verbatim} |
---|
| 361 | % |
---|
| 362 | \item |
---|
| 363 | \textbf{SGML declarations} such as entity declarations or doctype |
---|
| 364 | specifications are treated basically as comments. Interface: |
---|
| 365 | % |
---|
| 366 | \begin{verbatim} |
---|
| 367 | subroutine sgml_declaration_handler(sgml_declaration) |
---|
| 368 | character(len=*), intent(in) :: sgml_declaration |
---|
| 369 | end subroutine sgml_declaration_handler |
---|
| 370 | \end{verbatim} |
---|
| 371 | % |
---|
| 372 | \end{itemize} |
---|
| 373 | In the current version of the parser, overly long comments and SGML |
---|
| 374 | declarations might be truncated. |
---|
| 375 | |
---|
| 376 | |
---|
| 377 | \section{The XPATH interface} |
---|
| 378 | |
---|
| 379 | \textsl{NOTE: The current implementation gets its inspiration from |
---|
| 380 | XPATH, but by no means it is a complete, or even a subset, |
---|
| 381 | implementation of the standard. Since it is built on top of the SAX |
---|
| 382 | interface, it uses a ``stream" paradigm which is completely alien to |
---|
| 383 | the XPATH specification. It is nevertheless still quite useful. The |
---|
| 384 | author is open to suggestions to refine the interface.} |
---|
| 385 | |
---|
| 386 | \bigskip |
---|
| 387 | |
---|
| 388 | This API is based on the concept of an XML path. For example: |
---|
| 389 | % |
---|
| 390 | \begin{verbatim} |
---|
| 391 | /inventory/item |
---|
| 392 | \end{verbatim} |
---|
| 393 | % |
---|
| 394 | represents a 'item' element which is a child of the root element |
---|
| 395 | 'inventory'. Paths can contain special wildcard markers such as |
---|
| 396 | \texttt{//} and \texttt{*}. The following are examples of valid paths: |
---|
| 397 | % |
---|
| 398 | \begin{verbatim} |
---|
| 399 | //a : Any occurrence of element 'a', at any depth. |
---|
| 400 | /a/*/b : Any 'b' which is a grand-child of 'a' |
---|
| 401 | ./a : A relative path (with respect to the current path) |
---|
| 402 | a : (same as above) |
---|
| 403 | /a/b/./c : Same as /a/b/c (the dot (.) is a dummy) |
---|
| 404 | //* : Any element. |
---|
| 405 | //a/*//b : Any 'b' under any children of 'a'. |
---|
| 406 | |
---|
| 407 | \end{verbatim} |
---|
| 408 | % |
---|
| 409 | \subsection{Simple example} |
---|
| 410 | Using the XPATH interface it is possible to search for any element |
---|
| 411 | directly, and to recover its attributes or character content. For |
---|
| 412 | example, to print the names of all the appliances in the inventory: |
---|
| 413 | % |
---|
| 414 | \begin{verbatim} |
---|
| 415 | program simple |
---|
| 416 | use flib_xpath |
---|
| 417 | |
---|
| 418 | type(xml_t) :: fxml |
---|
| 419 | |
---|
| 420 | integer :: status |
---|
| 421 | character(len=100) :: what |
---|
| 422 | |
---|
| 423 | call open_xmlfile("inventory.xml",fxml,status) |
---|
| 424 | ! |
---|
| 425 | do |
---|
| 426 | call get_node(fxml,path="//description",pcdata=what,status=status) |
---|
| 427 | if (status < 0) exit |
---|
| 428 | print *, "Appliance: ", trim(what) |
---|
| 429 | enddo |
---|
| 430 | end program simple |
---|
| 431 | \end{verbatim} |
---|
| 432 | % |
---|
| 433 | Repeated calls to \texttt{get\_node} return the character content of |
---|
| 434 | the 'description' elements (at any depth). We exit the loop when the |
---|
| 435 | \texttt{status} variable is negative on return from the call. This |
---|
| 436 | indicates that there are no more elements matching the |
---|
| 437 | \texttt{//description} path pattern.\footnote{Returning a negative |
---|
| 438 | value for an end-of-file or end-or-record condition follows the |
---|
| 439 | standard practice. Positive return values signal malfunctions} |
---|
| 440 | |
---|
| 441 | Apart from path patterns, we can narrow our search by specifying |
---|
| 442 | conditions on the attribute list of the element. For example, to print |
---|
| 443 | only the prices which are given in euros we can use the |
---|
| 444 | \texttt{att\_name} and \texttt{att\_value} optional arguments: |
---|
| 445 | % |
---|
| 446 | \begin{verbatim} |
---|
| 447 | program euros |
---|
| 448 | use flib_xpath |
---|
| 449 | |
---|
| 450 | type(xml_t) :: fxml |
---|
| 451 | |
---|
| 452 | integer :: status |
---|
| 453 | character(len=100) :: price |
---|
| 454 | |
---|
| 455 | call open_xmlfile("inventory.xml",fxml,status) |
---|
| 456 | ! |
---|
| 457 | do |
---|
| 458 | call get_node(fxml,path="//price", & |
---|
| 459 | att_name="currency",att_value="euro", & |
---|
| 460 | pcdata=price,status=status) |
---|
| 461 | if (status < 0) exit |
---|
| 462 | print *, "Price (euro): ", trim(price) |
---|
| 463 | enddo |
---|
| 464 | end program euros |
---|
| 465 | \end{verbatim} |
---|
| 466 | % |
---|
| 467 | We can zero in on any element in this fashion, but we apparently give |
---|
| 468 | up the all-important context. What happens if we want to print |
---|
| 469 | \textsl{both} the appliance description and its price? |
---|
| 470 | % |
---|
| 471 | \begin{verbatim} |
---|
| 472 | program twoelements |
---|
| 473 | use flib_xpath |
---|
| 474 | |
---|
| 475 | type(xml_t) :: fxml |
---|
| 476 | |
---|
| 477 | integer :: status |
---|
| 478 | character(len=100) :: what, price, currency |
---|
| 479 | |
---|
| 480 | call open_xmlfile("inventory.xml",fxml,status) |
---|
| 481 | ! |
---|
| 482 | do |
---|
| 483 | call get_node(fxml,path="//description", & |
---|
| 484 | pcdata=what,status=status) |
---|
| 485 | if (status < 0) exit ! No more items |
---|
| 486 | ! |
---|
| 487 | ! Price comes right after description... |
---|
| 488 | ! |
---|
| 489 | call get_node(fxml,path="//price", & |
---|
| 490 | attributes=attributes,pcdata=price,status=status) |
---|
| 491 | if (status /= 0) stop "missing price element!" |
---|
| 492 | |
---|
| 493 | call get_value(attributes,"currency",currency,status) |
---|
| 494 | if (status /= 0) stop "missing currency attribute!" |
---|
| 495 | |
---|
| 496 | write(unit=*,fmt="(6a)") "Appliance: ", trim(what), & |
---|
| 497 | ". Price: ", trim(price), " ", trim(currency) |
---|
| 498 | enddo |
---|
| 499 | end program twoelements |
---|
| 500 | \end{verbatim} |
---|
| 501 | % |
---|
| 502 | \subsubsection{Exercises} |
---|
| 503 | \begin{enumerate} |
---|
| 504 | \item Modify the above programs to print only the appliances priced in |
---|
| 505 | euros. |
---|
| 506 | \item Modify the order of the 'description' and 'price' elements in a |
---|
| 507 | item. What happens to the 'twoelements' program output? |
---|
| 508 | \item The full XPATH specification allows the query for a particular |
---|
| 509 | element among a set of elements with the same path, based on the |
---|
| 510 | ordering of the element. For example, "/inventory/item[2]" will refer |
---|
| 511 | to the second 'item' element in the XML file. Write a routine that |
---|
| 512 | implements this feature and returns the element's attribute |
---|
| 513 | dictionary. |
---|
| 514 | \item Queries for paths can be issued in any order, and so some |
---|
| 515 | mechanism for "rewinding" the XML file is necessary. It is provided by |
---|
| 516 | the appropriately named \texttt{rewind\_xmlfile} subroutine (see full |
---|
| 517 | interface in the Reference section). Use it to implement a silly |
---|
| 518 | program that prints items from the inventory at random. (Extra points |
---|
| 519 | for including logic to minimize the number of rewinds.) |
---|
| 520 | \end{enumerate} |
---|
| 521 | % |
---|
| 522 | |
---|
| 523 | \subsection{Contexts and restricted searches} |
---|
| 524 | |
---|
| 525 | The logic of the \texttt{twoelements} program in the previous section |
---|
| 526 | follows from the assumption that the 'price' element follows the |
---|
| 527 | 'description' element in a typical 'item'. If the DTD says so, and |
---|
| 528 | the XML file is valid (in the technical sense of conforming to the |
---|
| 529 | DTD), the assumption should be correct. However, since the parser is |
---|
| 530 | non-validating, it might be unreasonable to expect the proper |
---|
| 531 | ordering in all cases. What we should expect (as a minimum) is that |
---|
| 532 | both the price and description elements are children of the 'item' |
---|
| 533 | element. In the following version we make use of the \textbf{context} |
---|
| 534 | concept to achieve a more robust solution. |
---|
| 535 | % |
---|
| 536 | \begin{verbatim} |
---|
| 537 | program item_context |
---|
| 538 | use flib_xpath |
---|
| 539 | |
---|
| 540 | type(xml_t) :: fxml, contex |
---|
| 541 | |
---|
| 542 | integer :: status |
---|
| 543 | character(len=100) :: what, price, currency |
---|
| 544 | |
---|
| 545 | call open_xmlfile("inventory.xml",fxml,status) |
---|
| 546 | ! |
---|
| 547 | do |
---|
| 548 | call mark_node(fxml,path="//item",status=status) |
---|
| 549 | if (status < 0) exit ! No more items |
---|
| 550 | context = fxml ! Save item context |
---|
| 551 | ! |
---|
| 552 | ! Search relative to context |
---|
| 553 | ! |
---|
| 554 | call get_node(fxml,path="price", & |
---|
| 555 | attributes=attributes,pcdata=price,status=status) |
---|
| 556 | call get_value(attributes,"currency",currency,status) |
---|
| 557 | if (status /= 0) stop "missing currency attribute!" |
---|
| 558 | ! |
---|
| 559 | ! Rewind to beginning of context |
---|
| 560 | ! |
---|
| 561 | fxml = context |
---|
| 562 | call sync_xmlfile(fxml) |
---|
| 563 | ! |
---|
| 564 | ! Search relative to context |
---|
| 565 | ! |
---|
| 566 | call get_node(fxml,path="description",pcdata=what,status=status) |
---|
| 567 | write(unit=*,fmt="(6a)") "Appliance: ", trim(what), & |
---|
| 568 | ". Price: ", trim(price), " ", trim(currency) |
---|
| 569 | enddo |
---|
| 570 | end program item_context |
---|
| 571 | \end{verbatim} |
---|
| 572 | % |
---|
| 573 | The call to \texttt{mark\_node} positions the parser's file handle |
---|
| 574 | \texttt{fxml} right after the end of the starting tag of the next |
---|
| 575 | 'item' element. We save that position as a ``context marker" to which |
---|
| 576 | we can return later on. The calls to \texttt{get\_node} use path |
---|
| 577 | patterns that do not start with a \texttt{/}: they are |
---|
| 578 | \textbf{searches relative to the current context}. After getting the |
---|
| 579 | information about the 'price' element, we restore the parser's file |
---|
| 580 | handle to the appropriate position at the beginning of the 'item' |
---|
| 581 | context, and search for the 'description' element. In the following |
---|
| 582 | iteration of the loop, the parser will find the next 'item' element, |
---|
| 583 | and the process will be repeated until there are no more 'item's. |
---|
| 584 | |
---|
| 585 | |
---|
| 586 | Contexts come in handy to encapsulate parsing tasks in re-usable |
---|
| 587 | subroutines. Suppose you are going to find the basic 'item' element |
---|
| 588 | content in a whole lot of different XML files. The following |
---|
| 589 | subroutine extracts the description and price information: |
---|
| 590 | % |
---|
| 591 | \begin{verbatim} |
---|
| 592 | subroutine get_item_info(context,what,price,currency) |
---|
| 593 | type(xml_t), intent(in) :: contex |
---|
| 594 | character(len=*), intent(out) :: what, price, currency |
---|
| 595 | |
---|
| 596 | ! |
---|
| 597 | ! Local variables |
---|
| 598 | ! |
---|
| 599 | type(xml_t) :: ff |
---|
| 600 | integer :: status |
---|
| 601 | type(dictionary_t) :: attributes |
---|
| 602 | |
---|
| 603 | ! |
---|
| 604 | ! context is read-only, so make a copy and sync just in case |
---|
| 605 | ! |
---|
| 606 | ff = context |
---|
| 607 | call sync_xmlfile(ff) |
---|
| 608 | ! |
---|
| 609 | call get_node(ff,path="price", & |
---|
| 610 | attributes=attributes,pcdata=price,status=status) |
---|
| 611 | call get_value(attributes,"currency",currency,status) |
---|
| 612 | if (status /= 0) stop "missing currency attribute!" |
---|
| 613 | ! |
---|
| 614 | ! Rewind to beginning of context |
---|
| 615 | ! |
---|
| 616 | ff = context |
---|
| 617 | call sync_xmlfile(ff) |
---|
| 618 | ! |
---|
| 619 | call get_node(ff,path="description",pcdata=what,status=status) |
---|
| 620 | |
---|
| 621 | end subroutine get_item_info |
---|
| 622 | \end{verbatim} |
---|
| 623 | % |
---|
| 624 | Using this routine, the parsing is much more compact: |
---|
| 625 | % |
---|
| 626 | \begin{verbatim} |
---|
| 627 | program item_context |
---|
| 628 | use flib_xpath |
---|
| 629 | |
---|
| 630 | type(xml_t) :: fxml |
---|
| 631 | |
---|
| 632 | integer :: status |
---|
| 633 | character(len=100) :: what, price, currency |
---|
| 634 | |
---|
| 635 | call open_xmlfile("inventory.xml",fxml,status) |
---|
| 636 | ! |
---|
| 637 | do |
---|
| 638 | call mark_node(fxml,path="//item",status=status) |
---|
| 639 | if (status /= 0) exit ! No more items |
---|
| 640 | call get_item_info(fxml,what,price,currency) |
---|
| 641 | write(unit=*,fmt="(6a)") "Appliance: ", trim(what), & |
---|
| 642 | ". Price: ", trim(price), " ", trim(currency) |
---|
| 643 | call sync_xmlfile(fxml) |
---|
| 644 | enddo |
---|
| 645 | end program item_context |
---|
| 646 | \end{verbatim} |
---|
| 647 | % |
---|
| 648 | It is extremely important to understand the meaning of the call to |
---|
| 649 | \texttt{sync\_xmlfile}. The file handle \texttt{fxml} holds parsing |
---|
| 650 | context \textbf{and} a physical pointer to the file position |
---|
| 651 | (basically a variable counting the number of characters read so |
---|
| 652 | far). When the context is passed to the subroutine and the parsing |
---|
| 653 | carried out, the context and the file position get out of |
---|
| 654 | sync. Synchronization means to re-position the physical file pointer |
---|
| 655 | to the place where it was when the context was first created. |
---|
| 656 | |
---|
| 657 | |
---|
| 658 | \subsubsection{Exercises} |
---|
| 659 | \begin{enumerate} |
---|
| 660 | \item Modify the above programs to print only the appliances priced in |
---|
| 661 | euros. |
---|
| 662 | \item Write a program that prints only the most expensive |
---|
| 663 | item. (Assume that the inventory is very large and it is not feasible |
---|
| 664 | to hold everything in memory...) |
---|
| 665 | \item Use the \texttt{get\_item\_info} subroutine to print |
---|
| 666 | descriptions and price information from the following XML file: |
---|
| 667 | % |
---|
| 668 | \begin{verbatim} |
---|
| 669 | <vacations> |
---|
| 670 | <trip> |
---|
| 671 | <description>Mediterranean cruise</description> |
---|
| 672 | <price currency="euro">1500.00</price> |
---|
| 673 | </trip> |
---|
| 674 | <trip> |
---|
| 675 | <description>Week in Majorca</description> |
---|
| 676 | <price currency="euro">300.00</price> |
---|
| 677 | </trip> |
---|
| 678 | <trip> |
---|
| 679 | <description>Wilderness Route</description> |
---|
| 680 | <price currency="swedish crown">10000.00</price> |
---|
| 681 | </trip> |
---|
| 682 | </vacations> |
---|
| 683 | \end{verbatim} |
---|
| 684 | % |
---|
| 685 | (Note that the routine does not care what the context name is (it |
---|
| 686 | could be 'item' or 'trip'). It is only the fact that the children |
---|
| 687 | ('description' and 'price') are the same that matters. |
---|
| 688 | \end{enumerate} |
---|
| 689 | |
---|
| 690 | \section{Handling of scientific data} |
---|
| 691 | |
---|
| 692 | \subsection{Numerical datasets} |
---|
| 693 | |
---|
| 694 | While the ASCII form is not the most efficient for the storage of |
---|
| 695 | numerical data, the portability and flexibility offered by the XML |
---|
| 696 | format makes it attractive for the interchange of scientific |
---|
| 697 | datasets. There are a number of efforts under way to standardize this |
---|
| 698 | area, and presumably we will have nifty tools for the creation and |
---|
| 699 | visualization of files in the near future. Even then, however, it will |
---|
| 700 | be necessary to be able to read numerical information into fortran |
---|
| 701 | programs. The \texttt{xmlf90} package offers limited but useful |
---|
| 702 | functionality in this regard, making it possible to build numerical |
---|
| 703 | arrays on the fly as the XML file containing the data is parsed. As an |
---|
| 704 | example, consider the dataset: |
---|
| 705 | % |
---|
| 706 | \begin{verbatim} |
---|
| 707 | <data> |
---|
| 708 | 8.90679398599 8.90729421510 8.90780189594 8.90831710494 |
---|
| 709 | 8.90883991832 8.90937041202 8.90990866166 8.91045474255 |
---|
| 710 | 8.91100872963 8.91157069732 8.91214071958 8.91271886986 |
---|
| 711 | 8.91330522098 8.91389984506 8.91450281355 8.91511419713 |
---|
| 712 | 8.91573406560 8.91636248785 8.91699953183 8.91764526444 |
---|
| 713 | 8.91829975142 8.91896305734 8.91963524555 8.92031637799 |
---|
| 714 | 8.92100651514 8.92170571605 8.92241403816 8.92313153711 |
---|
| 715 | 8.92385826683 8.92459427943 8.92533962491 8.92609435120 |
---|
| 716 | 8.92685850416 8.92763212726 8.92841526149 8.92920794545 |
---|
| 717 | </data> |
---|
| 718 | \end{verbatim} |
---|
| 719 | % |
---|
| 720 | and the following fragment of a \texttt{m\_handlers} module for SAX parsing: |
---|
| 721 | % |
---|
| 722 | \begin{verbatim} |
---|
| 723 | |
---|
| 724 | real, dimension(1000) :: x ! numerical array to hold data |
---|
| 725 | |
---|
| 726 | subroutine begin_element(name,attributes) |
---|
| 727 | ... |
---|
| 728 | select case(name) |
---|
| 729 | case("data") |
---|
| 730 | in_data = .true. |
---|
| 731 | ndata = 0 |
---|
| 732 | ... |
---|
| 733 | end select |
---|
| 734 | |
---|
| 735 | end subroutine begin_element |
---|
| 736 | !--------------------------------------------------------------- |
---|
| 737 | subroutine pcdata_chunk_handler(chunk) |
---|
| 738 | character(len=*), intent(in) :: chunk |
---|
| 739 | |
---|
| 740 | if (in_data) call build_data_array(chunk,x,ndata) |
---|
| 741 | ... |
---|
| 742 | |
---|
| 743 | end subroutine pcdata_chunk_handler |
---|
| 744 | !------------------------------------------------------------- |
---|
| 745 | subroutine end_element(name) |
---|
| 746 | ... |
---|
| 747 | select case(name) |
---|
| 748 | case("data") |
---|
| 749 | in_data = .false. |
---|
| 750 | print *, "Read ", ndata, " data elements." |
---|
| 751 | print *, "X: ", x(1:ndata) |
---|
| 752 | ... |
---|
| 753 | end select |
---|
| 754 | |
---|
| 755 | end subroutine end_element |
---|
| 756 | \end{verbatim} |
---|
| 757 | % |
---|
| 758 | When the \texttt{<data>} tag is encountered by the parser, the |
---|
| 759 | variable \texttt{ndata} is initialized. Any PCDATA chunks found from |
---|
| 760 | then on and until the \texttt{</data>} tag is seen are passed to the |
---|
| 761 | \texttt{build\_data\_array} generic subroutine, which converts the |
---|
| 762 | character data to the numerical format (integer, default real, double |
---|
| 763 | precision) implied by the array \texttt{x}. The array is filled with |
---|
| 764 | data and the \texttt{ndata} variable increased accordingly. |
---|
| 765 | |
---|
| 766 | If the data is known to represent a multi-dimensional array (something |
---|
| 767 | that could be encoded in the XML as attributes to the 'data' element, |
---|
| 768 | for example), the user can employ the fortran \texttt{reshape} |
---|
| 769 | intrinsic to obtain the final form. |
---|
| 770 | |
---|
| 771 | There is absolutely no limit to the size of the data (apart from |
---|
| 772 | filesystem size and total memory constraints) since the parser only |
---|
| 773 | holds in memory at any given time a small chunk of character data (the |
---|
| 774 | default is to split the character data stream and call the |
---|
| 775 | \texttt{pcdata\_chunk\_handler} routine at the end of a line, or at |
---|
| 776 | the end of a token if the line is too long). This is one of the most |
---|
| 777 | useful features of the SAX approach to XML parsing. |
---|
| 778 | |
---|
| 779 | In order to read numerical data with the XPATH interface in its |
---|
| 780 | current implementation, one must first read the PCDATA into the |
---|
| 781 | \texttt{pcdata} optional argument of \texttt{get\_node}, and then call |
---|
| 782 | \texttt{build\_data\_array}. However, there is an internal limit to |
---|
| 783 | the size of the PCDATA buffer, so this method cannot be safely used |
---|
| 784 | for large datasets at this point. In a forthcoming version there will |
---|
| 785 | be a generic subroutine \texttt{get\_node} with a \texttt{data} |
---|
| 786 | numerical array optional argument which will be filled by the parser |
---|
| 787 | on the fly. |
---|
| 788 | |
---|
| 789 | |
---|
| 790 | |
---|
| 791 | |
---|
| 792 | \subsubsection{Exercises} |
---|
| 793 | \begin{enumerate} |
---|
| 794 | \item Generate an XML file containing a large dataset, and write a |
---|
| 795 | program to read the information back. You might want to include |
---|
| 796 | somewhere in the XML file information about the number of data |
---|
| 797 | elements, so that an array of the proper size can be used. |
---|
| 798 | \item Devise a strategy to read a dataset without knowing in advance |
---|
| 799 | the number of data elements. (Some possibilities: re-sizable |
---|
| 800 | allocatable arrays, two-pass parsing...). |
---|
| 801 | \item Suggest a possible encoding for the storage of two-dimensional |
---|
| 802 | arrays, and write a program to read the information from the XML file |
---|
| 803 | and create the appropriate array. |
---|
| 804 | \item Write a program that could read a 10Gb Monte Carlo simulation |
---|
| 805 | dataset and print the average and standard deviation of the data. (We |
---|
| 806 | are not advocating the use of XML for such large datasets. NetCDF |
---|
| 807 | would be much more efficient in this case). |
---|
| 808 | \end{enumerate} |
---|
| 809 | |
---|
| 810 | \subsection{Mapping of XML elements to derived types} |
---|
| 811 | |
---|
| 812 | After the parsing, the data has to be put somewhere. A good strategy |
---|
| 813 | to handle structured content is to try to replicate it within data |
---|
| 814 | structures inside the user program. For example, an element of the |
---|
| 815 | form |
---|
| 816 | % |
---|
| 817 | \begin{verbatim} |
---|
| 818 | <table units="nm" npts="100"> |
---|
| 819 | <description>Cluster diameters</description> |
---|
| 820 | <data> |
---|
| 821 | 2.3 4.5 5.6 3.4 2.3 1.2 ... |
---|
| 822 | ... |
---|
| 823 | ... |
---|
| 824 | </data> |
---|
| 825 | </table> |
---|
| 826 | \end{verbatim} |
---|
| 827 | % |
---|
| 828 | could be mapped onto a derived type of the form: |
---|
| 829 | % |
---|
| 830 | \begin{verbatim} |
---|
| 831 | type :: table |
---|
| 832 | character(len=50) :: description |
---|
| 833 | character(len=20) :: units |
---|
| 834 | integer :: npts |
---|
| 835 | real, dimension(:), pointer :: data |
---|
| 836 | end type table |
---|
| 837 | \end{verbatim} |
---|
| 838 | % |
---|
| 839 | There could even be parsing and output subroutines associated to this |
---|
| 840 | derived type, so that the user can handle the XML production and |
---|
| 841 | reading transparently. Directory \texttt{Examples/} in the |
---|
| 842 | \texttt{xmlf90} distribution contains some code along these lines. |
---|
| 843 | |
---|
| 844 | \subsubsection{Exercises} |
---|
| 845 | % |
---|
| 846 | \begin{enumerate} |
---|
| 847 | \item Study the \texttt{pseudo} example in \texttt{Examples/sax/} and |
---|
| 848 | \texttt{Examples/xpath/}. Now, with your own application in mind, |
---|
| 849 | write derived-type definitions and parsing routines to handle your XML |
---|
| 850 | data (which would also need to be \textsl{designed} somehow). |
---|
| 851 | |
---|
| 852 | \end{enumerate} |
---|
| 853 | % |
---|
| 854 | |
---|
| 855 | |
---|
| 856 | \section{REFERENCE: Subroutine interfaces} |
---|
| 857 | \label{sec:reference} |
---|
| 858 | |
---|
| 859 | \subsection{Dictionary handling} |
---|
| 860 | |
---|
| 861 | Attribute lists are handled as instances of a derived type |
---|
| 862 | \texttt{dictionary\_t}, loosely inspired by the Python type. The |
---|
| 863 | terminology is more general: keys and entries instead of names and |
---|
| 864 | attributes. |
---|
| 865 | |
---|
| 866 | \begin{itemize} |
---|
| 867 | \item |
---|
| 868 | % |
---|
| 869 | \begin{verbatim} |
---|
| 870 | function number_of_entries(dict) result(n) |
---|
| 871 | ! |
---|
| 872 | ! Returns the number of entries in the dictionary |
---|
| 873 | ! |
---|
| 874 | type(dictionary_t), intent(in) :: dict |
---|
| 875 | integer :: n |
---|
| 876 | \end{verbatim} |
---|
| 877 | % |
---|
| 878 | \item |
---|
| 879 | % |
---|
| 880 | \begin{verbatim} |
---|
| 881 | function has_key(dict,key) result(found) |
---|
| 882 | ! |
---|
| 883 | ! Checks whether there is an entry with |
---|
| 884 | ! the given key in the dictionary |
---|
| 885 | ! |
---|
| 886 | type(dictionary_t), intent(in) :: dict |
---|
| 887 | character(len=*), intent(in) :: key |
---|
| 888 | logical :: found |
---|
| 889 | \end{verbatim} |
---|
| 890 | \item |
---|
| 891 | % |
---|
| 892 | \begin{verbatim} |
---|
| 893 | subroutine get_value(dict,key,value,status) |
---|
| 894 | ! |
---|
| 895 | ! Gets values by key |
---|
| 896 | ! |
---|
| 897 | type(dictionary_t), intent(in) :: dict |
---|
| 898 | character(len=*), intent(in) :: key |
---|
| 899 | character(len=*), intent(out) :: value |
---|
| 900 | integer, intent(out) :: status |
---|
| 901 | \end{verbatim} |
---|
| 902 | % |
---|
| 903 | \item |
---|
| 904 | % |
---|
| 905 | \begin{verbatim} |
---|
| 906 | subroutine get_key(dict,i,key,status) |
---|
| 907 | ! |
---|
| 908 | ! Gets keys by their order in the dictionary |
---|
| 909 | ! |
---|
| 910 | type(dictionary_t), intent(in) :: dict |
---|
| 911 | integer, intent(in) :: i |
---|
| 912 | character(len=*), intent(out) :: key |
---|
| 913 | integer, intent(out) :: status |
---|
| 914 | |
---|
| 915 | \end{verbatim} |
---|
| 916 | % |
---|
| 917 | \item |
---|
| 918 | % |
---|
| 919 | \begin{verbatim} |
---|
| 920 | subroutine print_dict(dict) |
---|
| 921 | ! |
---|
| 922 | ! Prints the contents of the dictionary to stdout |
---|
| 923 | ! |
---|
| 924 | type(dictionary_t), intent(in) :: dict |
---|
| 925 | \end{verbatim} |
---|
| 926 | \end{itemize} |
---|
| 927 | |
---|
| 928 | \subsection{SAX interface} |
---|
| 929 | |
---|
| 930 | \begin{itemize} |
---|
| 931 | \item |
---|
| 932 | \begin{verbatim} |
---|
| 933 | subroutine open_xmlfile(fname,fxml,iostat) |
---|
| 934 | ! |
---|
| 935 | ! Opens the file "fname" and creates an xml handle fxml |
---|
| 936 | ! iostat /= 0 on error. |
---|
| 937 | ! |
---|
| 938 | character(len=*), intent(in) :: fname |
---|
| 939 | integer, intent(out) :: iostat |
---|
| 940 | type(xml_t), intent(out) :: fxml |
---|
| 941 | \end{verbatim} |
---|
| 942 | \item |
---|
| 943 | \begin{verbatim} |
---|
| 944 | subroutine xml_parse(fxml, begin_element_handler, & |
---|
| 945 | end_element_handler, & |
---|
| 946 | pcdata_chunk_handler, & |
---|
| 947 | comment_handler, & |
---|
| 948 | xml_declaration_handler, & |
---|
| 949 | sgml_declaration_handler, & |
---|
| 950 | error_handler, & |
---|
| 951 | signal_handler, & |
---|
| 952 | verbose, & |
---|
| 953 | empty_element_handler) |
---|
| 954 | |
---|
| 955 | type(xml_t), intent(inout), target :: fxml |
---|
| 956 | |
---|
| 957 | optional :: begin_element_handler |
---|
| 958 | optional :: end_element_handler |
---|
| 959 | optional :: pcdata_chunk_handler |
---|
| 960 | optional :: comment_handler |
---|
| 961 | optional :: xml_declaration_handler |
---|
| 962 | optional :: sgml_declaration_handler |
---|
| 963 | optional :: error_handler |
---|
| 964 | optional :: signal_handler ! see XPATH code |
---|
| 965 | logical, intent(in), optional :: verbose |
---|
| 966 | optional :: empty_element_handler |
---|
| 967 | |
---|
| 968 | \end{verbatim} |
---|
| 969 | \item Interfaces for handlers follow: |
---|
| 970 | |
---|
| 971 | \begin{verbatim} |
---|
| 972 | subroutine begin_element_handler(name,attributes) |
---|
| 973 | character(len=*), intent(in) :: name |
---|
| 974 | type(dictionary_t), intent(in) :: attributes |
---|
| 975 | end subroutine begin_element_handler |
---|
| 976 | |
---|
| 977 | subroutine end_element_handler(name) |
---|
| 978 | character(len=*), intent(in) :: name |
---|
| 979 | end subroutine end_element_handler |
---|
| 980 | |
---|
| 981 | subroutine pcdata_chunk_handler(chunk) |
---|
| 982 | character(len=*), intent(in) :: chunk |
---|
| 983 | end subroutine pcdata_chunk_handler |
---|
| 984 | |
---|
| 985 | subroutine comment_handler(comment) |
---|
| 986 | character(len=*), intent(in) :: comment |
---|
| 987 | end subroutine comment_handler |
---|
| 988 | |
---|
| 989 | subroutine xml_declaration_handler(name,attributes) |
---|
| 990 | character(len=*), intent(in) :: name |
---|
| 991 | type(dictionary_t), intent(in) :: attributes |
---|
| 992 | end subroutine xml_declaration_handler |
---|
| 993 | |
---|
| 994 | subroutine sgml_declaration_handler(sgml_declaration) |
---|
| 995 | character(len=*), intent(in) :: sgml_declaration |
---|
| 996 | end subroutine sgml_declaration_handler |
---|
| 997 | |
---|
| 998 | subroutine error_handler(error_info) |
---|
| 999 | type(xml_error_t), intent(in) :: error_info |
---|
| 1000 | end subroutine error_handler |
---|
| 1001 | |
---|
| 1002 | subroutine signal_handler(code) |
---|
| 1003 | logical, intent(out) :: code |
---|
| 1004 | end subroutine signal_handler |
---|
| 1005 | |
---|
| 1006 | subroutine empty_element_handler(name,attributes) |
---|
| 1007 | character(len=*), intent(in) :: name |
---|
| 1008 | type(dictionary_t), intent(in) :: attributes |
---|
| 1009 | end subroutine empty_element_handler |
---|
| 1010 | \end{verbatim} |
---|
| 1011 | \end{itemize} |
---|
| 1012 | |
---|
| 1013 | Other file handling routines (some of them really only useful within |
---|
| 1014 | the XPATH interface): |
---|
| 1015 | |
---|
| 1016 | \begin{itemize} |
---|
| 1017 | \item |
---|
| 1018 | \begin{verbatim} |
---|
| 1019 | subroutine REWIND_XMLFILE(fxml) |
---|
| 1020 | ! |
---|
| 1021 | ! Rewinds the physical file associated to fxml and clears the data |
---|
| 1022 | ! structures used in parsing. |
---|
| 1023 | ! |
---|
| 1024 | type(xml_t), intent(inout) :: fxml |
---|
| 1025 | \end{verbatim} |
---|
| 1026 | |
---|
| 1027 | \item |
---|
| 1028 | \begin{verbatim} |
---|
| 1029 | subroutine SYNC_XMLFILE(fxml,status) |
---|
| 1030 | ! |
---|
| 1031 | ! Synchronizes the physical file associated to fxml so that reading |
---|
| 1032 | ! can resume at the exact point in the parsing saved in fxml. |
---|
| 1033 | ! |
---|
| 1034 | type(xml_t), intent(inout) :: fxml |
---|
| 1035 | integer, intent(out) :: status |
---|
| 1036 | |
---|
| 1037 | \end{verbatim} |
---|
| 1038 | \item |
---|
| 1039 | \begin{verbatim} |
---|
| 1040 | subroutine CLOSE_XMLFILE(fxml) |
---|
| 1041 | ! |
---|
| 1042 | ! Closes the file handle fmxl (and the associated OS file object) |
---|
| 1043 | ! |
---|
| 1044 | type(xml_t), intent(inout) :: fxml |
---|
| 1045 | \end{verbatim} |
---|
| 1046 | \end{itemize} |
---|
| 1047 | |
---|
| 1048 | \subsection{XPATH interface} |
---|
| 1049 | % |
---|
| 1050 | \begin{itemize} |
---|
| 1051 | \item |
---|
| 1052 | \begin{verbatim} |
---|
| 1053 | subroutine MARK_NODE(fxml,path,att_name,att_value,attributes,status) |
---|
| 1054 | ! |
---|
| 1055 | ! Performs a search of a given element (by path, and/or presence of |
---|
| 1056 | ! a given attribute and/or value of that attribute), returning optionally |
---|
| 1057 | ! the element's attribute dictionary, and leaving the file handle fxml |
---|
| 1058 | ! ready to process the rest of the element's contents (child elements |
---|
| 1059 | ! and/or pcdata). |
---|
| 1060 | ! |
---|
| 1061 | ! Side effects: it sets a "path_mark" in fxml to enable its use as a |
---|
| 1062 | ! context. |
---|
| 1063 | ! |
---|
| 1064 | ! If the argument "path" is present and evaluates to a relative path (a |
---|
| 1065 | ! string not beginning with "/"), the search is interrupted after the end |
---|
| 1066 | ! of the "ancestor_element" set by a previous call to "mark_node". |
---|
| 1067 | ! If not earlier, the search ends at the end of the file. |
---|
| 1068 | ! |
---|
| 1069 | ! The status argument, if present, will hold a return value, |
---|
| 1070 | ! which will be: |
---|
| 1071 | ! |
---|
| 1072 | ! 0 on success, |
---|
| 1073 | ! negative in case of end-of-file or end-of-ancestor-element, or |
---|
| 1074 | ! positive in case of other malfunction |
---|
| 1075 | ! |
---|
| 1076 | type(xml_t), intent(inout), target :: fxml |
---|
| 1077 | character(len=*), intent(in), optional :: path |
---|
| 1078 | character(len=*), intent(in), optional :: att_name |
---|
| 1079 | character(len=*), intent(in), optional :: att_value |
---|
| 1080 | type(dictionary_t), intent(out), optional :: attributes |
---|
| 1081 | integer, intent(out), optional :: status |
---|
| 1082 | \end{verbatim} |
---|
| 1083 | |
---|
| 1084 | \item |
---|
| 1085 | \begin{verbatim} |
---|
| 1086 | subroutine GET_NODE(fxml,path,att_name,att_value,attributes,pcdata,status) |
---|
| 1087 | ! |
---|
| 1088 | ! Performs a search of a given element (by path, and/or presence of |
---|
| 1089 | ! a given attribute and/or value of that attribute), returning optionally |
---|
| 1090 | ! the element's attribute dictionary and any PCDATA characters contained |
---|
| 1091 | ! in the element's scope (but not child elements). It leaves the file handle |
---|
| 1092 | ! physically and logically positioned: |
---|
| 1093 | ! |
---|
| 1094 | ! after the end of the element's start tag if 'pcdata' is not present |
---|
| 1095 | ! after the end of the element's end tag if 'pcdata' is present |
---|
| 1096 | ! |
---|
| 1097 | ! If the argument "path" is present and evaluates to a relative path (a |
---|
| 1098 | ! string not beginning with "/"), the search is interrupted after the end |
---|
| 1099 | ! of the "ancestor_element" set by a previous call to "mark_node". |
---|
| 1100 | ! If not earlier, the search ends at the end of the file. |
---|
| 1101 | ! |
---|
| 1102 | ! The status argument, if present, will hold a return value, |
---|
| 1103 | ! which will be: |
---|
| 1104 | ! |
---|
| 1105 | ! 0 on success, |
---|
| 1106 | ! negative in case of end-of-file or end-of-ancestor-element, or |
---|
| 1107 | ! positive in case of a malfunction (such as the overflow of the |
---|
| 1108 | ! user's pcdata buffer). |
---|
| 1109 | ! |
---|
| 1110 | type(xml_t), intent(inout), target :: fxml |
---|
| 1111 | character(len=*), intent(in), optional :: path |
---|
| 1112 | character(len=*), intent(in), optional :: att_name |
---|
| 1113 | character(len=*), intent(in), optional :: att_value |
---|
| 1114 | type(dictionary_t), intent(out), optional :: attributes |
---|
| 1115 | character(len=*), intent(out), optional :: pcdata |
---|
| 1116 | integer, intent(out), optional :: status |
---|
| 1117 | \end{verbatim} |
---|
| 1118 | \end{itemize} |
---|
| 1119 | % |
---|
| 1120 | \subsection{PCDATA conversion routines} |
---|
| 1121 | \begin{itemize} |
---|
| 1122 | \item |
---|
| 1123 | |
---|
| 1124 | \begin{verbatim} |
---|
| 1125 | subroutine build_data_array(str,x,n) |
---|
| 1126 | ! |
---|
| 1127 | ! Incrementally builds the data array x from |
---|
| 1128 | ! character data contained in str. n holds |
---|
| 1129 | ! the number of entries of x set so far. |
---|
| 1130 | ! |
---|
| 1131 | character(len=*), intent(in) :: str |
---|
| 1132 | NUMERIC TYPE, dimension(:), intent(inout) :: x |
---|
| 1133 | integer, intent(inout) :: n |
---|
| 1134 | ! |
---|
| 1135 | ! NUMERIC TYPE can be any of: |
---|
| 1136 | ! integer |
---|
| 1137 | ! real |
---|
| 1138 | ! real(kind=selected_real_kind(14)) |
---|
| 1139 | ! |
---|
| 1140 | \end{verbatim} |
---|
| 1141 | \end{itemize} |
---|
| 1142 | |
---|
| 1143 | \subsection{Other utility routines} |
---|
| 1144 | \begin{itemize} |
---|
| 1145 | \item |
---|
| 1146 | |
---|
| 1147 | \begin{verbatim} |
---|
| 1148 | function xml_char_count(fxml) result (nc) |
---|
| 1149 | ! |
---|
| 1150 | ! Provides the value of the processed-characters counter |
---|
| 1151 | ! |
---|
| 1152 | type(xml_t), intent(in) :: fxml |
---|
| 1153 | integer :: nc |
---|
| 1154 | |
---|
| 1155 | nc = nchars_processed(fxml%fb) |
---|
| 1156 | |
---|
| 1157 | end function xml_char_count |
---|
| 1158 | \end{verbatim} |
---|
| 1159 | \end{itemize} |
---|
| 1160 | |
---|
| 1161 | \section{Other parser features, limitations, and design issues} |
---|
| 1162 | |
---|
| 1163 | \subsection{Features} |
---|
| 1164 | \begin{itemize} |
---|
| 1165 | \item |
---|
| 1166 | The parser can detect badly formed documents, giving by default an |
---|
| 1167 | error report including the line and column where it happened. It also |
---|
| 1168 | will accept an \texttt{error\_handler} routine as another optional |
---|
| 1169 | argument, for finer control by the user. In the SAX interface, if the |
---|
| 1170 | optional logical argument "verbose" is present and it is ".true.", the |
---|
| 1171 | parser will offer detailed information about its inner workings. In |
---|
| 1172 | the XPATH interface, there are a pair of routines, |
---|
| 1173 | \texttt{enable\_debug} and \texttt{disable\_debug}, to control |
---|
| 1174 | verbosity. See \texttt{Examples/xpath/} for examples. |
---|
| 1175 | |
---|
| 1176 | \item |
---|
| 1177 | It ignores PCDATA outside of element context (and warns about it) |
---|
| 1178 | |
---|
| 1179 | \item |
---|
| 1180 | Attribute values can be specified using both single and double |
---|
| 1181 | quotes (as per the XML specs). |
---|
| 1182 | |
---|
| 1183 | \item |
---|
| 1184 | It processes the default entities: \> \& \< \' and |
---|
| 1185 | \" and decimal and hex character entities (for example: \&\#123; |
---|
| 1186 | \&\#4E;). The processing is not |
---|
| 1187 | "on the fly", but after reading chunks of PCDATA. |
---|
| 1188 | |
---|
| 1189 | \item |
---|
| 1190 | Understands and processes CDATA sections (transparently passed as |
---|
| 1191 | PCDATA to the handler). |
---|
| 1192 | |
---|
| 1193 | \end{itemize} |
---|
| 1194 | |
---|
| 1195 | See \texttt{Examples/sax/features} for an illustration of the above |
---|
| 1196 | features. |
---|
| 1197 | |
---|
| 1198 | \subsection{Limitations} |
---|
| 1199 | \begin{itemize} |
---|
| 1200 | |
---|
| 1201 | \item It is not a validating parser. |
---|
| 1202 | |
---|
| 1203 | \item It accepts only single-byte encodings for characters. |
---|
| 1204 | |
---|
| 1205 | \item Currently, there are hard-wired limits on the length of element |
---|
| 1206 | and attribute identifiers, and the length of attribute values and |
---|
| 1207 | unbroken (i.e., without whitespace) PCDATA sections. The limit is |
---|
| 1208 | set in \texttt{sax/m\_buffer.f90} to \texttt{MAX\_BUFF\_SIZE=300}. |
---|
| 1209 | |
---|
| 1210 | \item Overly long comments and SGML declarations can also be |
---|
| 1211 | truncated, but the effect is currently harmless since the parser does |
---|
| 1212 | not make use of that information. In a future version there could be a |
---|
| 1213 | more robust retrieval mechanism. |
---|
| 1214 | |
---|
| 1215 | \item The number of attributes is limited to \texttt{MAX\_ITEMS=20} |
---|
| 1216 | in \texttt{sax/m\_dictionary.f90}: |
---|
| 1217 | |
---|
| 1218 | |
---|
| 1219 | \item In the XPATH interface, returned PCDATA character buffers |
---|
| 1220 | cannot be larger than an internal size of |
---|
| 1221 | \texttt{MAX\_PCDATA\_SIZE=65536} set in \texttt{xpath/m\_path.f90} |
---|
| 1222 | |
---|
| 1223 | |
---|
| 1224 | \end{itemize} |
---|
| 1225 | |
---|
| 1226 | \subsection{Design Issues} |
---|
| 1227 | |
---|
| 1228 | See \texttt{\{sax,xpath\}/Developer.Guide}. |
---|
| 1229 | |
---|
| 1230 | The parser is actually written in the \texttt{F} subset of Fortran90, |
---|
| 1231 | for which inexpensive compilers are available. (See |
---|
| 1232 | \texttt{http://fortran.com/imagine1/}). |
---|
| 1233 | |
---|
| 1234 | There are two other projects aimed at parsing XML in Fortran: those of |
---|
| 1235 | Mart Rentmeester (\texttt{http://nn-online.sci.kun.nl/fortran/}) and |
---|
| 1236 | Arjen Markus (\texttt{http://xml-fortran.sourceforge.net/}). Up to |
---|
| 1237 | this point the three projects have progressed independently, but it is |
---|
| 1238 | anticipated that there will be a pooling of efforts in the near |
---|
| 1239 | future. |
---|
| 1240 | |
---|
| 1241 | \newpage |
---|
| 1242 | \section{Installation Instructions} |
---|
| 1243 | % |
---|
| 1244 | There is extensible built-in support for arbitrary compilers. The |
---|
| 1245 | setup discussed below is taken from the author's \texttt{flib} |
---|
| 1246 | project\footnote{There seems to be other projects with that very obvious |
---|
| 1247 | name...} The idea is to have a configurable repository of useful |
---|
| 1248 | modules and library objects which can be accessed by fortran |
---|
| 1249 | programs. Different compilers are supported by tailored macros. |
---|
| 1250 | |
---|
| 1251 | \texttt{xmlf90} is just one of several packages in \texttt{flib}, |
---|
| 1252 | hence the \texttt{flib\_} prefix in the package's visible module |
---|
| 1253 | names. |
---|
| 1254 | |
---|
| 1255 | To install the package, follow this steps: |
---|
| 1256 | |
---|
| 1257 | \begin{verbatim} |
---|
| 1258 | |
---|
| 1259 | * Create a directory somewhere containing a copy of the stuff in the |
---|
| 1260 | subdirectory 'macros': |
---|
| 1261 | |
---|
| 1262 | cp -rp macros $HOME/flib |
---|
| 1263 | |
---|
| 1264 | * Define the environment variable FLIB_ROOT to point to that directory. |
---|
| 1265 | |
---|
| 1266 | FLIB_ROOT=$HOME/flib ; export FLIB_ROOT (sh-like shells) |
---|
| 1267 | setenv FLIB_ROOT $HOME/flib (csh-like shells) |
---|
| 1268 | |
---|
| 1269 | |
---|
| 1270 | * Go into $FLIB_ROOT, look through the fortran-XXXX.mk files, |
---|
| 1271 | and see if one of them applies to your computer/compiler combination. |
---|
| 1272 | If so, copy it or make a (symbolic) link to 'fortran.mk': |
---|
| 1273 | |
---|
| 1274 | ln -sf fortran-lf95.mk fortran.mk |
---|
| 1275 | |
---|
| 1276 | If none of the .mk files look useful, write your own, using the |
---|
| 1277 | files provided as a guide. Basically you need to figure out the |
---|
| 1278 | name and options for the compiler, the extension assigned to |
---|
| 1279 | module files, and the flag used to identify the module search path. |
---|
| 1280 | |
---|
| 1281 | The above steps need only be done once. |
---|
| 1282 | |
---|
| 1283 | * Go into subdirectory 'sax' and type 'make'. |
---|
| 1284 | * Go into subdirectory 'xpath' and type 'make'. |
---|
| 1285 | * Go into subdirectory 'Tutorial' and try the exercises in this guide |
---|
| 1286 | (see the next section for compilation details). |
---|
| 1287 | * Go into subdirectory 'Examples' and explore. |
---|
| 1288 | |
---|
| 1289 | \end{verbatim} |
---|
| 1290 | % |
---|
| 1291 | \section{Compiling user programs} |
---|
| 1292 | \label{sec:compiling} |
---|
| 1293 | |
---|
| 1294 | After installation, the appropriate modules and library files should |
---|
| 1295 | already be in \texttt{\$FLIB\_ROOT/modules} and |
---|
| 1296 | \texttt{\$FLIB\_ROOT/lib}, respectively. To compile user programs, it |
---|
| 1297 | is suggested that the user create a separate directory to hold the |
---|
| 1298 | program files and prepare a \texttt{Makefile} following the template |
---|
| 1299 | (taken from \texttt{Examples/sax/simple/}): |
---|
| 1300 | |
---|
| 1301 | \begin{verbatim} |
---|
| 1302 | #--------------------------------------------------------------- |
---|
| 1303 | # |
---|
| 1304 | default: example |
---|
| 1305 | # |
---|
| 1306 | #--------------------------- |
---|
| 1307 | MK=$(FLIB_ROOT)/fortran.mk |
---|
| 1308 | include $(MK) |
---|
| 1309 | #--------------------------- |
---|
| 1310 | # |
---|
| 1311 | # Uncomment the following line for debugging support |
---|
| 1312 | # |
---|
| 1313 | FFLAGS=$(FFLAGS_DEBUG) |
---|
| 1314 | # |
---|
| 1315 | LIBS=$(LIB_PREFIX)$(LIB_STD) -lflib |
---|
| 1316 | # |
---|
| 1317 | OBJS= m_handlers.o example.o |
---|
| 1318 | |
---|
| 1319 | example: $(OBJS) |
---|
| 1320 | $(FC) $(LDFLAGS) -o $@ $(OBJS) $(LIBS) |
---|
| 1321 | # |
---|
| 1322 | clean: |
---|
| 1323 | rm -f *.o example *$(MOD_EXT) |
---|
| 1324 | # |
---|
| 1325 | #--------------------------------------------------------------- |
---|
| 1326 | \end{verbatim} |
---|
| 1327 | % |
---|
| 1328 | Here it is assumed that the user has two source files, |
---|
| 1329 | \texttt{example.f90} and \texttt{m\_handlers.f90}. Simply typing |
---|
| 1330 | \texttt{make} will compile \texttt{example}, pulling in all the needed |
---|
| 1331 | modules and library objects. |
---|
| 1332 | |
---|
| 1333 | |
---|
| 1334 | \end{document} |
---|