Working with XML files in SAS
The manipulation of XML documents is quite important in programming in the field of clinical research. This manifests itself in particular through the Operational Data Model (ODM). ODM allows for XML-based transmission of data that has been used in the conduct of a clinical trial. SAS provides support for importing and exporting ODM.
The XMLV2 LIBNAME engine is made to process XML documents. With XMLV2 LIBNAME engine it is possible to export data to a XML document.
XML documents are known to be transportable. Therefore, if you want to move the data to another server or host, you can do it. It is also possible to import the XML file and then read it as a SAS dataset.
In order to use the XMLV2 LIBNAME you have to use a LIBNAME statement. In that LIBNAME statement you have to specify the engine that you want to use. The libref is valid for the duration of the SAS session.
A Libref with the XMLV2 engine can be assigned to a specific XML document or a directory.
Import XML document to SAS
Suppose that you want to import an XML document as a SAS dataset. The following LIBNAME statement assigns a libref to a specific XML document and specifies the XMLV2 engine:
libname myxml xmlv2 ‘ file path\file.xml’;
Export XML document from SAS
To export to an XML document, you can also do it by creating a LIBNAME statement with the XMLV2 engine and assigns it to the XML document to be created.
Let’s see that through an example:
In this example the first LIBNAME is pointing to the directory where there is the SAS dataset to be exported.
The second LIBNAME statement assigns the libref and points it to the XML document that will be the result of the export.
libname myfiles ‘ SAS file path ‘;
libname myxml xmlv2 ‘ file path\ xml file.xml’;
Executing these statements creates the XML document named Singers.XML:
data myxml.xml_dataset; set myfiles.sas_dataset; run;
It has to be noted that if the SAS dataset is updated then the XML dataset will not be updated.
Also, it is not possible to use PROC SORT or PROC SQL on XML dataset. This is due to the fact that the engine is a sequential access engine which means that it processes data one record after the other. In order to use PROC SORT or PROC SQL the engine has to provide random (direct) access.
When you’re transferring the file between environment (For example with file transfer protocol), you have to be aware of the content of the document. This is in order to determine the appropriate transfer mode. If the document contains an encoding attribute in the XML declaration or a byte-order mark then you should transfer the file in binary mode. If the document contains neither of those things and you are transferring the document across similar hosts then you should transfer the file in text mode.
What is a byte-order mark?
The Byte Order Marker (BOM) is a series of byte values placed on the beginning of an encoded text stream file. This data allows the machine that reads the file to correctly decide which character encoding to use when displaying the file in a text format. The use of byte order markers within files is not specific to XML, but it is typical to see them in use when XML data
When you import an XML document that was created in a different character encoding than your SAS session, you have to be aware of the possibility of errors due to transcoding. If an XML document does not specify an ENCODING= attribute in the XML declaration, then SAS attempts to identify the encoding from a byte-order mark.
If SAS does not find the encoding information or the byte-order mark then the session encoding will be applied. If the actual encoding of the document is not the same as the SAS session then transcoding error will happen.
When you export an XML, by default, the XML document contains an encoding attribute in the XML declaration. This has been written by the SAS session.
<?xml version=”1.0" encoding=”utf-8" ?>
You can override what the SAS session is writing in the XML document by putting the following option XMLENCODING= in the LIBNAME statement.