Skip to content

Handle Wikipedia table faulty boolean design; and HTML headers with row/colspan > 1 (also XLS) #105

@JBThiel

Description

@JBThiel

TO thombashi, Tsuyoshi Hombashi
SUBJ Handle Wikipedia table faulty boolean design, and headers with row/colspan > 1

I have noticed a couple design flaws in how Wikipedia generates tables.
Maybe you would like to incorporate some workarounds in sqlitebiter:

A) Some (all?) tables use a GREEN CHECK / RED X indicator for Boolean columns. A broken design, they are using
only TD attributes to display the image, and the actual TD cell content is blank, empty. This results in NULLs
coming out of conversion programs (incl sqlitebiter).
This situation can be searched/fixed by looking for pattern <td data-sort-value="Yes" ... (or "No"), and injecting 1/0.

B) Some tables have header cells with rowspan/colspan > 1, for making a fancy "grouped" header.
This breaks the column assignment for later data values.
For example, if the first header row has 5 cells including one with colspan=2, and the 2nd header row and data rows have 6 cells,
then converters (incl SqliteBiter) will see only the 5 cells, and some data will go missing or end up in wrong-named columns.
It's a little tricky to detect/automate, since the rowspan/colspan settings are across multiple html lines.
Example broken Wikipedia tables at https://en.wikipedia.org/wiki/Comparison_of_text_editors

NOTE OF APPRECIATION FOR Tsuyoshi Hombashi
Very nice work on SqliteBiter. And great name, clever with good rhyme.
Just what we need to grab/manage those tables floating around everywhere.
I had come up with some multi-stage dataflows through other apps, but too many steps, kept searching
for a better solution, which you have made.
Thanks for producing such a useful comprehensive CLI converter tool.

Best regards,
John
-- jbthiel

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions