Categories
Web Content Management

Using schema.org as a starting point for (headless) WCM

(Alternatively, why are WCM systems – particularly headless ones – not embedding this stuff yesterday? Comments welcome!)

First, a little background for those unfamiliar with schema.org. This spartan (ugly?) website hides some very powerful technology and concepts. It’s a open community project started by various tech leaders to codify structured data for very common use data types and use cases. At this point in time, it is very comprehensive, encompassing thousands of schemata across all areas of life. Want to describe an artery? There is a schema for that. Train station? Sculpture? Yup.

Content modelling is a time-consuming process (there are great books on the subject) but it’s necessary for multi-channel use and literally every headless CMS system depends on it. Because of this key requirement, schema.org is a great resource to learn from and use.

You’ll notice that these schemata are, ahem, very “complete” in that any schema probably has a lot more information than you will need in your particular application. Second, you probably notice that many schemata share linked information. For example, you will see that TrainStation inherits from CivicStructure, Place and Thing (which is the default starting schema for everything – no pun intended) – and may also be a value in an arrivalStation and departureStation property in a TrainTrip.

Now, the really interesting part is why this is useful (and not explicitly explained on the site). It turns out that Google has been (for quite some time now) actively using these outputted structured data directly in search results and Gmail to help drive the user experience (save a click, create calendar entries automatically, etc.). This is known as JSON-LD and it is simply a JSON representation of the schema (*there is an XML “microdata” format, but man, it’s heavy and doesn’t meet any of the better use cases for XML, so by all means use JSON). Google maintains a website outlining how you can get started with structured data in their ecosystem. Although they have clearly started from a place of “publisher first” (content like articles, recipes, movie reviews, etc. ) I strongly suspect it will begin to cover more and more elements over time – not just for these visual elements, but particularly for voice.

Back to the train example, here are a number of ways Google is using schema.org in JSON-LD/microdata

  • Reservation confirmation – Train Reservation schema – “The Google app will display the reservation details on the day of the journey and will notify the user of the time to leave to get to the train station on time (taking into account the transport mode, traffic etc). If you provide a check-in URL like in the example below, the Google app will display this to the user 24 hours prior to the trip to the user. “
  • Display results in search – information you put into your page output looks like this:

And the JSON-LD which resulted in Google being able to properly parse and display that output looked like this:

<script type="application/ld+json">
{"@context":"https://schema.org","@type":"TrainTrip","departureStation":
	{"@type":"TrainStation","name":"Bologna"},"arrivalStation":	
	{"@type":"TrainStation","name":"Florence"},"provider"[null,null,null,null,null],"potentialAction"
			{"@context":"https://schema.org","@type":"TravelAction","distance":81,"fromLocation":					{"@type":"city","name":"Bologna"},
				"toLocation":{"@type":"city","name":"Florence"},
				"result":{"@type":"AggregateOffer","lowPrice":"11.73 €"}}}
</script>

So, as a company, if you know you are going to be outputting JSON-LD in order to improve SEO and customer experience, you can use these schemata all the way from planning, to your CMS internal content model, to your output.

Now, one thing that schema.org won’t do for you is map to your product strategy, content strategy or layout. Looking at the schema for a Product is fantastic if you are selling shoes as it will help with most of the fields you need (SKU, model, manufacturer, etc.) it’s less useful if you are doing things like tying aspirational elements to it via copy and layout – so you need to be able to take what you need, discard the rest, and adapt. Similarly, schema.org does not take any channel output into account – so you need to be able to add fields and limit depending on publishing channels (i.e. social, mobile).


Now that I am outside the vendor sphere, I think it would be fantastic if most vendors would be able to use schema.org as a starting point. I am undertaking a project where many of the content types we need are fully described in schema.org, and used extensively by Google (as JSON-LD) – so using this end-to-end in content supply chain makes total sense.

And yet, to the best of my knowledge, there is not a single WCM vendor (headless or no) that allows importing or using schema.org as a starter kit. It would be a lovely thing to be able to see the entire schema tree as a starting point and just search, select/deselect nodes and elements, and *boom* instant, linked content model.

(Note, Sitecore Commerce does use Product elements from schema.org – but I still think it would be useful to see the tooling and practice of working with schema.org more widely adopted)

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.