Have you ever heard of EasyChair? It is a free, simple and efficient way for managing a (scientific) conference: provides you
all most of the tools for handling paper submitting, approving and camera-ready submitting (for further details please refer to EasyChair website).
The first issue is that some advanced features (as the complete data access as XML) is only available as a paid service. What if the data you need is already available but only as an (hugly) HTML file? I needed the whole list of accepted papers and the only option was an HTML page, formatted by DIVs and not, as some accessibility rules suggests, as a Table. First solution: copy-n-paste from HTML to a spreadsheet. More advanced: provide a script for converting such file to a “well written” HTML. In the generated file the list of papers are in a HTML table, no stylesheets are applied and all the links to authors webpages are removed.
Here we go: a simple set of sed rules to convert the list of accepted papers to a table based page.
Put all of these in a .sed file and invoke the sed commad as:
#sed -f file.sed < accepted-papers.html > accepted-papers-converted.html
The file.sed contents:
s/<br\/>/ /g s/<style>.*<\/style>//g s/<\/h1>/<\/h1><table>/g s/<\/body>/<\/table><\/body>/g s/<b>Abstract: <\/b>//g s/<\/div><div class="paper">/<\/tr><tr><td>/g s/<div class="paper">/<tr><td>/g s/<span class="authors"><span>//g s/<\/span>\. <\/span>/<\/td>/g s/<span class="authors">//g s/\. <\/span>/<\/td>/g s/<span class="title">/<td>/g s/<\/div><div class="abstract">/<\/td><td>/g s/<a href="[^"]*">\([^<]*\)<\/a>/\1/gg