This post is part of a series of notes that I’m putting together regarding XML within SharePoint.
- Intro and History of XML
- XML in SharePoint 101
- XML in SharePoint Search Pt. 1 (This post)
- XML in SharePoint Search Pt. 2
Introduction
- XML – eXtensible Markup Language.
-
XSLT – eXtensible StyLe Sheet.
Microsoft made quite a direction change with SharePoint 2010 by changing to XML data views in place of the old List View WebParts from 2007. As a result, even if you had steadfastly been avoiding XML/XSLT in SharePoint 2007, you were faced with it now.
I’d already played heavily with XML/XSLT with regards to the Search result web parts in 2007, and even now that remains one of the areas that I am most likely to be playing with XSLT style sheets to display XML based data in a meaningful way. For recent clients I’ve used the People Search Core Results web part to produce department listings, Phone directories and even First Aider and Fire Marshall lists through the use of pre-canned search results and Managed Metadata in User Profiles.
In this series of posts I’m going to look at XML with respect to SharePoint. I’m not going to try and teach you XML/XSLT beyond some basic terms, If that’s what you need, then I would suggest looking at www.w3schools.com and perhaps some of the Tennison books. (NB: SharePoint is using XSLT 1.0 so you don’t have quite as many fancy extensions as you may like, however Microsoft does have some extensions of it’s own that you can call upon within SharePoint, Most notably of which is the DDWRT extension library that have been around since SharePoint 2003.)
History
Some brief history. The early pre-cursor to XML was GML, invented by Goldfarb, Moser and Lorie of IBM as a descriptor language for marking up technical documents.
This became SGML or Standardised General Markup Language. SGML wasn’t in itself a markup language, more of a descriptor of markup languages, the most famous of which is HTML, still in use heavily today.
The key problem with HTML is that it is inherently unstructured with loose adherence to the rules, For example browsers handle un-terminated data tags, something which would cause an XML parser to throw a fit.
The other problem with HTML is separation, or rather lack of it, the markup of an HTML document generally controls the way the document is presented (I say generally here because we do have a far greater control over HTML especially with the advent of XHTML by the W3 consortium)
With XML (generally attributed to a consortium of people headed by Jon Bosak), the separation of data and presentation is complete, XML describes the data and XSLT is used to present this data in a form meaningful to a human being.
What is XML?
Well personally I think the answer is many things, however at it’s heart, XML is a structured way of describing information within a defined set of rules.
These can be rules that you keep in your head and adhere to yourself in your own code, or you can define a set of rules known as a Schema (There is also an animal called a DTD or Document type Definition, however these were less powerful than Schemas and I find them much rarer in the wild now.)
The standard example found across the internet is to describe your Movie or CD collection, instead, I’m going to use Trees as an example of data.
So if we were to classify the trees in my Garden, I may use the following structure:-
<?xml version="1.0" encoding="utf-8"?>
<Trees>
<Tree Genus="Prunus" Subgenus="Amygdalus">
<CommonName>Almond</CommonName>
</Tree>
<Tree Genus="Taxus" Subgenus="Baccata">
<CommonName>Yew</CommonName>
<CommonName>English Yew</CommonName>
<CommonName>European Yew</CommonName>
</Tree>
<Tree Genus="Buxus" Subgenus="Sempervirens">
<CommonName>Boxwood</CommonName>
</Tree>
</Trees>
The first line is the Declaration that denotes this as an XML document corresponding to Version 1.0 of the xml language and encoded as UTF-8. This is generally the only XML declaration you’ll see in an English XML document.
After this come nested XML elements all which may or may not have attributes within the tags and may or may not have data between the tags. What they will ALWAYS have is a Single Root element (there can be only one.. think Highlander!) , a Starting and Ending tag and no overlapping of the structure!
A document that adheres to the structure rules of XML is considered a Well-Formed XML document.
The first element TREES is the root element of our XML document. Within these we see our first Child Elements of Tree, each of which has an attribute of Genus and SubGenus, denoting the latin name for the Tree.
Within each Tree element, we also have a CommonName element, with the common name as the data within the tags. You’ll also note that we have more than one common name in our example.
If we were to write a schema for this file, We would probably start with a list of rules such as:-
- Each TREE element MUST have a Genus attribute
- Each TREE element MUST have a Subgenus attribute
- Each Tree element may have zero or more CommonName child elements
A document that conformed to these Schema rules would be considered to be a VALID xml document.
Note those two terms as they crop up heavily throughout XML usage.. Well-formed means it conforms to the rules of XML and Valid means it conforms to the Schema applied to the document.
Where now.
That’s a brief overview of XML for now. The next post in the series will look at XML and where it is used in SharePoint.