University Web Developers

University Web Developers

We're in the process of moving from static HTML pages (Dreamweaver templates) to a content management system for our main university web site.  A lot of CMSes have import utilities that can use CSV (Comma Separated Value) files or XML.

I am looking for a way to grab all of the data on our old pages and put them into one giant CSV file (or at least a folder at a time) for bulk import into our new site.  Finding the content areas on our pages would be pretty easy, since they're already clearly marked in the code (ie. "<!-- InstanceBeginEditable name="Text" -->").

Does anyone have experience doing something like this?  Something like Mozenda ( looks promising, but I'd prefer something that is a little cheaper or at least doesn't charge by the page.

Views: 1376

Reply to This

Replies to This Discussion

I'm nearing the end of a 5-month CMS migration from Dreamweaver. I did it by hand (read: work-study students) because the HTML in Dreamweaver had 10 years of cruft built up and I didn't want that in my shiny new CMS. I could have done it in a couple days, though, with a Ruby script. If you don't have someone with those skills, I'm sure you could hire someone (or yours truly) for just a few hours to do it.

I'd second the notion that depending on how recently those pages were created, you could be bringing along font tags and the associated "cruft."

I'm working on something similar, parsing through HTML and producing an XML markup used by our new CMS. I just want to get stuff in. My tool of choice is PERL given all the pattern matching abilities of that language.
Stephanie Leary built a really great plugin for WordPress that does just this. You could pull your static content into a WP install and export to XML. Might be a quick and easy way to accomplish the task.

Just curious, what kind of CMS will you be using?

dotCMS is the frontrunner at the moment.
I've helped with a few such migrations.Getting the data areas themselves aren't hard. A good developer could write a script for it rather easily. It's the formatting, accessibility, etc that is a problem especially with DW where there tends to be a lot of Garbage code. I've tried plugins and other methods, but in the end your best bet might be something on the lines of a student worker and a text editor. Anything else can cause all sorts of extra problems down the line.
Hi Stephen,

At NNU, we just accomplished content migration from static HTML files (Dreamweaver) into Typo3 using a custom extension we built. I'd be happy to let you look at the code and see if you can use it. You might also consider Typo3 as a CMS.

How big is the site you are looking to migrate?



How deeply have you delved into this? Have you thought about site structure? Is that going to change at all? What about any attachments like PDFs and images that are part of the content?

The Typo3 extension we built not only imports the HTML files into the db, it also imports any files, such as PDF (you specify file types), rebuilds the links to work with the CMS, and if you have Dreamweaver template tags around your content it can target only that area.

Of course there is still some manual editing we are doing, but I processed at 35k+ file site in less than an hour and have only needed to make relatively small tweaks on what was imported (not counting actual restructuring)

The extension is still a little rough, but fully workable. We were planning on releasing it back to the Typo3 community after a little cleanup, but if you would like to take a gander at it please feel free to talk to Zac or myself
I have a BBEdit Text Factory that does this. If you're on a Mac, you can download BBEdit from (free for 30 days), go to the menu that looks like a gear, open the Text Factories folder, and then put in the file below (once you've extracted it from the .zip archive). Once you put it into the folder, you can run the Factory over a whole folder/site full of documents.
I would look into YQL (Yahoo Query Library). You can build your own query that delivers either an XML tree or JSON structure then parse that query with any language. I've never used in the matter you are looking to do but it's worth a shot.
Well, it doesn't look like it will do the job en masse but DW CS5 can export the template data as an XML document. You can export the XML two ways, one where the editable sections names as the XML tags and one that uses DW XML tags.

I recently built a migration script as a proof of concept that my college could make the change to Drupal pretty seamlessly.


Our current CMS (reddot) has the ability to output valid XHTML files. I built a script in classic ASP script (am currently porting it to PHP) that scraped these XHTML files using Xpath and output them into a CSV file. I just made sure to consistently name elements (e.g. - <div id="header">) so that Xpath could find them.



Latest Activity

Sara Arnold commented on Lynn Zawie's group OmniUpdate
"Web governance should not be an afterthought; when it’s done right, it can actually enhance your workflow and make your job easier."
Sara Arnold commented on Lynn Zawie's group OmniUpdate
"Exciting news... OmniUpdate has merged with Destiny Solutions! Learn more on our blog."
Oct 30
Linda Faciana commented on Lynn Zawie's group OmniUpdate
"Switching to a new CMS? Join our next webcast with Briana Johnson from @OSUIT to learn how to convince decentralized web content authors to tolerate the switch, actively participate, and enjoy it!"
Oct 29
Sara Arnold commented on Lynn Zawie's group OmniUpdate
"Your website is the front door to your college or university. Your website design has to accommodate for the way that students interact with and use the information your institution provides."
Oct 24
Sara Arnold commented on Lynn Zawie's group OmniUpdate
"Learn how a new website design and CMS helped Florida Gulf Coast University increase new visits to the school’s website with improved SEO."
Oct 21
Profile IconJames Pollard and Michael Clarke joined University Web Developers
Oct 21
Linda Faciana commented on Lynn Zawie's group OmniUpdate
"Join our next webcast with Kelly Rushing from @uofsouthalabama to learn how to create accessible PDFs for your website by starting with your source documents."
Oct 18
Sara Arnold commented on Lynn Zawie's group OmniUpdate
"Learn why your college or university should choose SaaS across the board, especially for your next CMS."
Oct 8
Linda Faciana commented on Lynn Zawie's group OmniUpdate
"Join us for our next webcast with OmniUpdate CEO Lance Merker, who will delve into key insights about Generation Z’s online search behaviors to help you refine your school's web marketing strategy."
Oct 3
Sara Arnold commented on Lynn Zawie's group OmniUpdate
"Our newest guide will help you learn what it means to be accessible, how to implement accessibility best practices, quick fixes to try as well as a long-term plan, plus tools to help you in your website accessibility efforts. Download it now!"
Oct 1
Linda Faciana commented on Lynn Zawie's group OmniUpdate
"Are online forms more efficient? Learn how El Camino College used Formstack to create online forms that expedited processing, improved communications, increased transparency, and promoted accountability across campus."
Sep 18
Sara Arnold commented on Lynn Zawie's group OmniUpdate
"It's important to understand the science behind your web pages to better engage and ultimately attract prospective students to your site."
Sep 12
Linda Faciana commented on Lynn Zawie's group OmniUpdate
"If you’re struggling with web challenges such as accessibility, SEO, design consistency, workflow, content governance, or how to start a website redesign, you’re not alone. Join our next webcast to learn how other higher ed institutions…"
Sep 5
Sara Arnold commented on Lynn Zawie's group OmniUpdate
"eQAfy confirms that OU Campus is still the #1 commercial CMS for colleges and universities in the United States."
Aug 28
Sara Arnold commented on Lynn Zawie's group OmniUpdate
"Here’s an outline of everything you need to know about OCR compliance, including what it is, what your college or university can do to stay compliant, and resources for OCR compliance. #accessibility"
Aug 23
Linda Faciana commented on Lynn Zawie's group OmniUpdate
"Join us for our next webcast with April Buscher from Montana State University Billings to learn how blind readers and people with hearing impairment view and read your website and how you can make it accessible to them."
Aug 14
Amanda Lawson joined Lynn Zawie's group


Share your experiences using OmniUpdate CMS
Aug 9
Amanda Lawson posted a photo

Amanda Lawson

Amanda Lawson, Web Content ManagerCommunity College of Allgheny County
Aug 9
Sara Arnold commented on Lynn Zawie's group OmniUpdate
"High schoolers spend more time on their digital devices than they do sleeping, doing homework, or participating in extracurricular activities. So how do you make your message stand out to them? #eexpect"
Aug 8
Linda Faciana commented on Lynn Zawie's group OmniUpdate
"Want to increase digital engagement with high school juniors and seniors? Join our next webcast with Stephanie Geyer from Ruffalo Noel Levitz as she shares new data from the 2019 E-Expectations Trend Report on email, paid media, and social media…"
Jul 31

UWEBD has been in existence for more than 10 years and is the very best email discussion list on the Internet, in any industry, on any topic


© 2019   Created by Mark Greenfield.   Powered by

Badges  |  Report an Issue  |  Terms of Service