Using schema.org as a starting point for (headless) WCM

(Alternatively, why are WCM systems – particularly headless ones – not embedding this stuff yesterday? Comments welcome!)

First, a little background for those unfamiliar with schema.org. This spartan (ugly?) website hides some very powerful technology and concepts. It’s a open community project started by various tech leaders to codify structured data for very common use data types and use cases. At this point in time, it is very comprehensive, encompassing thousands of schemata across all areas of life. Want to describe an artery? There is a schema for that. Train station? Sculpture? Yup.

Content modelling is a time-consuming process (there are great books on the subject) but it’s necessary for multi-channel use and literally every headless CMS system depends on it. Because of this key requirement, schema.org is a great resource to learn from and use.

You’ll notice that these schemata are, ahem, very “complete” in that any schema probably has a lot more information than you will need in your particular application. Second, you probably notice that many schemata share linked information. For example, you will see that TrainStation inherits from CivicStructure, Place and Thing (which is the default starting schema for everything – no pun intended) – and may also be a value in an arrivalStation and departureStation property in a TrainTrip.

Now, the really interesting part is why this is useful (and not explicitly explained on the site). It turns out that Google has been (for quite some time now) actively using these outputted structured data directly in search results and Gmail to help drive the user experience (save a click, create calendar entries automatically, etc.). This is known as JSON-LD and it is simply a JSON representation of the schema (*there is an XML “microdata” format, but man, it’s heavy and doesn’t meet any of the better use cases for XML, so by all means use JSON). Google maintains a website outlining how you can get started with structured data in their ecosystem. Although they have clearly started from a place of “publisher first” (content like articles, recipes, movie reviews, etc. ) I strongly suspect it will begin to cover more and more elements over time – not just for these visual elements, but particularly for voice.

Back to the train example, here are a number of ways Google is using schema.org in JSON-LD/microdata

  • Reservation confirmation – Train Reservation schema – “The Google app will display the reservation details on the day of the journey and will notify the user of the time to leave to get to the train station on time (taking into account the transport mode, traffic etc). If you provide a check-in URL like in the example below, the Google app will display this to the user 24 hours prior to the trip to the user. “
  • Display results in search – information you put into your page output looks like this:

And the JSON-LD which resulted in Google being able to properly parse and display that output looked like this:

<script type="application/ld+json">
{"@context":"https://schema.org","@type":"TrainTrip","departureStation":
	{"@type":"TrainStation","name":"Bologna"},"arrivalStation":	
	{"@type":"TrainStation","name":"Florence"},"provider"[null,null,null,null,null],"potentialAction"
			{"@context":"https://schema.org","@type":"TravelAction","distance":81,"fromLocation":					{"@type":"city","name":"Bologna"},
				"toLocation":{"@type":"city","name":"Florence"},
				"result":{"@type":"AggregateOffer","lowPrice":"11.73 €"}}}
</script>

So, as a company, if you know you are going to be outputting JSON-LD in order to improve SEO and customer experience, you can use these schemata all the way from planning, to your CMS internal content model, to your output.

Now, one thing that schema.org won’t do for you is map to your product strategy, content strategy or layout. Looking at the schema for a Product is fantastic if you are selling shoes as it will help with most of the fields you need (SKU, model, manufacturer, etc.) it’s less useful if you are doing things like tying aspirational elements to it via copy and layout – so you need to be able to take what you need, discard the rest, and adapt. Similarly, schema.org does not take any channel output into account – so you need to be able to add fields and limit depending on publishing channels (i.e. social, mobile).


Now that I am outside the vendor sphere, I think it would be fantastic if most vendors would be able to use schema.org as a starting point. I am undertaking a project where many of the content types we need are fully described in schema.org, and used extensively by Google (as JSON-LD) – so using this end-to-end in content supply chain makes total sense.

And yet, to the best of my knowledge, there is not a single WCM vendor (headless or no) that allows importing or using schema.org as a starter kit. It would be a lovely thing to be able to see the entire schema tree as a starting point and just search, select/deselect nodes and elements, and *boom* instant, linked content model.

(Note, Sitecore Commerce does use Product elements from schema.org – but I still think it would be useful to see the tooling and practice of working with schema.org more widely adopted)

Predictions for 2020

Two analyst firms have put out some predictions for the future of WCM earlier this year;

I’m going to outline my 2020 predictions for the market – a little early as I want to get a head start. I reference both firms above because in a lot of cases, I agree with the thinking. In fact, much of the “agile CMS” manifesto from Mark Grannan was very similar to some of the underlying strategy behind the Stylelabs acquisition and some of the CMP integration roadmap. That said, I don’t want to repeat what’s been said already, so here are my (hopefully distinct) observations.

Vendor and category blur intensifies

All sorts of weird and wonderful things are happening to martech, but the biggest one of consequence is the overall level of market confusion that businesses will have to contend with. WCM is getting both bigger (continually adding chunks of CDP, adtech, marketing automation, etc.) and smaller (headless, SaaS solutions such as HubSpot CMS and Wix.com). Questions around cloud (IaaS, PaaS, single-tenant SaaS, multi-tenant SaaS, static-site generation, front-end JavaScript, CDNs, Netlify, Gatsby.js, etc. etc. etc.), subscription vs. perpetual and pretty much every combination add to the level of confusion.

CDP as a standalone category first struggled to differentiate itself from DMP (good article from Martin Kihn, formerly of Gartner, now at Salesforce), and now has to contend with being absorbed by marketing cloud platforms, WCM and CRM vendors (did I mention Martin Kihn was now at Salesforce? What a co-incidence…). For various reasons (mostly to do with some big-data buzzwords) the CDP category has massively exploded well beyond the ability for the market to carry it (expect a blog post on that one).

Similarly, WCM will start to get absorbed into larger marketing clouds. Salesforce has a CMS beta in Community Cloud where many of the other functions (CDP, personalization, user management, marketing automation, etc.) will already live in those clouds, similarly SAP is also making a concerted effort around CX – their current offering (formerly Hybris) plus Gigya give them a solid base. I am far less sure of the value the Qualtrics acquisition brings to the play (especially at a 20x valuation) – unless it’s akin to the Microsoft acquisition of Groove networks (buying people and culture – namely folks like Ray Ozzie who became the CTO and ultimately laid the groundwork for cultural transformation and cloud).

CMP consolidating or being subsumed into adjacent technologies

The core functions of a Content Marketing Platform; collaboration, curation, campaign planning and content strategy/funnel mapping are functions that should have been WCM for quite some time. I wrote a longer post here, but in summary, the need for CMP will accelerate, but the sector itself will face some revenue/growth challenges and consolidation.

Adtech vs. martech replacing the old IT vs. Marketing battleground

The main source of “creative tension” in WCM has always been characterized by the relationship between IT and the marketing organization. WCM used to be almost solely the domain of IT; it was a lot of infrastructure to get up and running and IT would treat the deployment of content much like handling support tickets, which of course was not tenable. While there are still many IT-centric WCM systems – and IT still has a veto over technology choices – it could be argued that marketing is now firmly in the driver’s seat as customer experience drives more budget and purchasing decisions.

The new battleground for budget and attention will be adtech vs. martech. Recently, Jason Fried of Basecamp tweeted a well-known but interesting point about the fact that competitors can buy up keywords of your company/product name. This and the more obvious point about the fact that buying keywords is easy – there are fewer and fewer opportunities for differentiation. Combine this with the general trend for consumers opting out of advertising means that lazy brand-building is no longer acceptable and that advertising as a model is at risk.

This of course means that the slack needs to be picked up by better customer experience, which is enabled by better customer journeys, and far better content marketing. Some example vendors doing a great job of this;

SaaS takes over the WCM mid-market, but not the enterprise

This has actually been happening for quite some time – WordPress is basically a SaaS offering in most contexts (PHP is easy to deploy and customize in real-time, either on their own or your hosting platform) and you need only look at it’s prevalence in the Stackie Awards year-over-year to see that WordPress is pretty prevalent here, even among customers with greater marketing needs (the business logic is offloaded to other integrated systems).

What is new in 2019 is that in some stacks, there is no specific callout for WCM at all. Often you will simply see a reference to HubSpot or Salesforce Marketing cloud. Similarly, in Commerce you are often seeing Shopify own a large segment of the market where lower Gross Merchandise Value (GMV) makes it simpler to use this as an option.

(PS – as per Shopify’s 40-F – they rely solely on Stripe as a payment provider. How long before these two shack up and own both online and PoS channels?)

However, at the enterprise space, most customers still have strong integration/customization requirements. In some cases, legislation may mandate an on-prem solution, or they are tied to legacy back-end systems. And even where these requirements don’t exist, the vendors servicing those markets are tied to some legacy stacks and the sheer amount of functionality makes it hard to migrate. Adobe and others are making managed services far easier, but this is still not quite a true SaaS play yet. Only one vendor (to my knowledge) has successfully undergone this transformation; Oracle led the trend in converting a “leading (but legacy) enterprise” stack with the old ATG commerce and spent a number of years in the wilderness dropping out of Forrester leader position and only recently returning to market leadership with a full refactoring of Oracle Commerce Cloud (but notably also retaining IaaS/PaaS/on-prem flexibility with Oracle Commerce and much shared codebase). But even that required a number of years and considerable “market confusion” (as per Gartner) about the various offerings.

Wildcard bonus prediction: Adobe to (maybe) acquire some overlapping products

I’ve never quite got my head around the Adobe acquisition of Magento. AEM is a very enterprise-heavy, java-based WCM system. While I had predicted that Adobe would still buy a Commerce solution (despite having their long-standing and solid Hybris partnership stymied by SAP), I would not have guessed Magento. While Adobe had some open-source roots from Day, including elements like the Jackrabbit JCR and Sling being Apache projects, it is largely now a proprietary play with an emphasis on enterprise development and integration and not an open-source community or model.

In turn, Magento is a PHP-based system, with a mid-market and a still active open-source model. Some analysts have argued that Adobe wasn’t really acquiring the technology, they were acquiring the community, which I don’t disagree with. (It could have also been a spoiler to prevent Acquia from further deepening their partnership with Magento, all their other competitors at that level – namely Sitecore and Episerver – having eCommerce offerings themselves).

That said, they now have two very different products, markets and models to anchor their main Customer Experience entry points – so here’s my wildcard prediction: I think Adobe will still look to acquire a java-based Commerce system (my guess would be Elastic Path) to fit snugly with AEM and also look to either buy or build a WCM that is more appropriate to Magento and that mid-market/community-based audience.

There is certainly some precedent for acquiring and maintaining overlapping vendors; Adobe acquiring both Neolane and Marketo, Salesforce acquiring Demandware and Cloudcraze (not to mention owning Pardot, which they got by virtue of it being acquired by ExactTarget) – but at this point in the vendor musical chairs, there are fewer options that may make sense at the price (versus building), so I am actively hedging on this one. It may not happen, but I would be completely unsurprised if it did. But I do think that Adobe will have more trouble trying to push AEM down to mid-market (the architecture alone makes this difficult), so they need to do something drastic if they want to succeed in both markets.