Quantcast
Channel: Codewerks
Viewing all articles
Browse latest Browse all 10

My first attempt at a Microdata Extractor.

$
0
0

I’ve just pushed to github, version 10^-2 of MD_Extract . It’s my first attempt at a Microdata consumer.

I based the extraction algorithm on the one published by the whatwg , though the implementation has some variations, mainly for clarity of code and also due to the particulars of it being done in PHP. I took Tab’s suggestion and it does a first pass through the HTML tree to collect references to elements with IDs which makes the code so much clearer and nicer than what I was originally planning of doing. In fact I think the algorithm is beautiful ( and it’s O(n), where n is the number of nodes in the html tree ).

I have versioned it at V. 10^-2 because I have not found that many examples to test it, there are also some anticipated problems with character encodings that do not extend ASCII and a couple of little things I’d like to add. But as far as I know, regarding microdata syntax it’s 100% compliant with the latest spec.


Viewing all articles
Browse latest Browse all 10

Trending Articles