Channel: Codewerks

↧

Version 10^-1.9 of MD_Extract pushed to github

February 16, 2011, 6:04 pm

≫ Next: My take on Microdata versus Microformats

≪ Previous: My first attempt at a Microdata Extractor.

Completely changed the way the string representing the HTML is preprocessed before being fed to tidy. I’ve just changed the function and the approach. The function is not really very elegant but it fixes a bunch of bugs. It’s mostly character iteration and lots and lots of flags (old school style!!). But it got me thinking after doing some quick browsing on the HTML parsing algorithm provided by the WHATWG if I shouldn’t just write my own (though it looks sort of hard and specially time consuming). I’ve also been looking at the source code of tidy and though it’s quite big the other option would be to try to contribute to it and help update it to HTML 5, but it would take some time for me to get to know the base code and the project seems to have been abandoned (and it might be quite big for just one person to work on). Anyhow, I’m not promising anything so far.

I do understand that the current approach that the library takes on this (preprocessing and then sending to tidy) is not the most efficient one. However there is another take on efficiency and that’s economic efficiency, and except for really heavy duty Microdata consuming the library does fulfill it’s purpose and the truth is Microdata is a new spec that still has to be widely adopted, so that’s not a real concern right now. So the question is whether if it makes sense to spend the next 3 months writing a parser from scratch, when the one I have does fit my needs (and probably those of 99.999% of PHP developers that may use the library). So far I don’t see the point. But then again my geeky side keeps bugging me to do it right.

Well, anyhow if you find any bugs (and I’m sure there might be many, simply because there are very few microdata examples and I might be missing strange markup some user might come up with ), please report them!!. Other than that I will write a post next on why I believe microdata to be better than microformats and I would also probably write a personal post that I’m sort of owing myself to write.

↧

Latest Images

7 clever tricks Primark does to keep you walking & buying more than you need...

7 clever tricks Primark does to keep you walking & buying more than you need...

July 20, 2025, 5:14 am

Art for Everyone! Autism advocacy, local stories, and indigenous pride in one...

Art for Everyone! Autism advocacy, local stories, and indigenous pride in one...

July 20, 2025, 5:06 am

Paintings of English Downs 2

Paintings of English Downs 2

July 20, 2025, 4:30 am

How Kerala Women Rescued a Dying Forest and Turned It Into a Safe Haven for...

How Kerala Women Rescued a Dying Forest and Turned It Into a Safe Haven for...

July 20, 2025, 3:30 am

Met Eireann warns of heavy rain & spot flooding for DAYS before big...

Met Eireann warns of heavy rain & spot flooding for DAYS before big...

July 20, 2025, 1:14 am

Who is Kevin Lerena’s wife Geraldine?

Who is Kevin Lerena’s wife Geraldine?

July 20, 2025, 12:57 am

Man stabs woman, baby to death inside Queens home, police say

Man stabs woman, baby to death inside Queens home, police say

July 19, 2025, 11:00 pm

Ang papel ni whistleblower Julie Patidongan sa kaso ng mga nawawalang sabungero

Ang papel ni whistleblower Julie Patidongan sa kaso ng mga nawawalang sabungero

July 19, 2025, 9:45 pm

Telangana Human Rights Commission (TGHRC) seeks report from revenue dept on...

Telangana Human Rights Commission (TGHRC) seeks report from revenue dept on...

July 19, 2025, 7:29 pm

Crisis-hit NHS fat cats raking in MASSIVE salaries as frontline services cry...

Crisis-hit NHS fat cats raking in MASSIVE salaries as frontline services cry...

July 19, 2025, 2:11 pm

Trending Articles

46 Dr. Murphys Place, Portlaoise, Co. Laois

February 1, 2018, 12:35 pm

In Court: Cases heard at Central Devon Magistrates' Court

December 17, 2014, 4:00 pm

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

February 13, 2020, 3:12 am

[RELEASE THREAD]--_A-Team_--Cricket_Dream_5G

September 25, 2022, 7:14 pm

Love (2015).H264.Italian.English.Ac3.5.1.multisub.iCV-MIRCrew Seed (62)/Leech...

September 14, 2017, 10:49 am

Performing the Initial Setup of the Citrix FAS Administration Console fails...

September 5, 2021, 6:27 am

Black Angus Grilled Artichokes

July 16, 2016, 4:37 pm

It’s Kind of a Funny Story 2010 Dual Audio 720p BRRip [Hindi – English] ESubs

June 8, 2016, 6:15 am

CalCen

June 4, 2020, 6:35 pm

Download: Philosopher ft B1, Red Linso, PST & Bob Mabege – Mwenilili (Prod By...

March 21, 2020, 7:42 am

SONY DVD PLAYER SMPS - PWB SRV2020WW - Repair Help - FROM MY SERVICE TABLE TODAY

April 6, 2015, 10:09 am

Students hit streets to save Agriculture College land in city

October 13, 2018, 2:20 am

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

December 22, 2016, 3:50 am

Practice Sheet of Pronoun References for HSC Students

October 19, 2019, 11:55 pm

ROBERT F TOSTA Arrested by Miami-Dade County Corrections on Nov 12, 2016

November 12, 2016, 11:22 pm

Waves Complete v2019.02.14 Incl Emulator-R2R

February 16, 2019, 7:50 am

236 kg banned scented tobacco worth Rs 1.26 lakh seized in Wadi

June 22, 2021, 5:54 am

Gulabi kallu Lyrics and translation | GAV / Govindhudu andhari vadele (2014)

September 16, 2014, 6:33 am

Skint TV teen to be sentenced

August 30, 2013, 10:45 pm

King's Bounty - Легенда о Рыцаре: Трейнер (+7) [1.7] {NEG1264}

November 1, 2012, 2:40 am

Latest Images

7 clever tricks Primark does to keep you walking & buying more than you need...

7 clever tricks Primark does to keep you walking & buying more than you need...

July 20, 2025, 5:14 am

Art for Everyone! Autism advocacy, local stories, and indigenous pride in one...

Art for Everyone! Autism advocacy, local stories, and indigenous pride in one...

July 20, 2025, 5:06 am

Paintings of English Downs 2

Paintings of English Downs 2

July 20, 2025, 4:30 am

How Kerala Women Rescued a Dying Forest and Turned It Into a Safe Haven for...

How Kerala Women Rescued a Dying Forest and Turned It Into a Safe Haven for...

July 20, 2025, 3:30 am

Met Eireann warns of heavy rain & spot flooding for DAYS before big...

Met Eireann warns of heavy rain & spot flooding for DAYS before big...

July 20, 2025, 1:14 am

Who is Kevin Lerena’s wife Geraldine?

Who is Kevin Lerena’s wife Geraldine?

July 20, 2025, 12:57 am

Man stabs woman, baby to death inside Queens home, police say

Man stabs woman, baby to death inside Queens home, police say

July 19, 2025, 11:00 pm

Ang papel ni whistleblower Julie Patidongan sa kaso ng mga nawawalang sabungero

Ang papel ni whistleblower Julie Patidongan sa kaso ng mga nawawalang sabungero

July 19, 2025, 9:45 pm

Telangana Human Rights Commission (TGHRC) seeks report from revenue dept on...

Telangana Human Rights Commission (TGHRC) seeks report from revenue dept on...

July 19, 2025, 7:29 pm

Crisis-hit NHS fat cats raking in MASSIVE salaries as frontline services cry...

Crisis-hit NHS fat cats raking in MASSIVE salaries as frontline services cry...

July 19, 2025, 2:11 pm

© 2025 //www.rssing.com