University Web Developers

University Web Developers

We're in the process of moving from static HTML pages (Dreamweaver templates) to a content management system for our main university web site.  A lot of CMSes have import utilities that can use CSV (Comma Separated Value) files or XML.

I am looking for a way to grab all of the data on our old pages and put them into one giant CSV file (or at least a folder at a time) for bulk import into our new site.  Finding the content areas on our pages would be pretty easy, since they're already clearly marked in the code (ie. "<!-- InstanceBeginEditable name="Text" -->").

Does anyone have experience doing something like this?  Something like Mozenda (http://www.mozenda.com/) looks promising, but I'd prefer something that is a little cheaper or at least doesn't charge by the page.

Views: 1356

Reply to This

Replies to This Discussion

I'm nearing the end of a 5-month CMS migration from Dreamweaver. I did it by hand (read: work-study students) because the HTML in Dreamweaver had 10 years of cruft built up and I didn't want that in my shiny new CMS. I could have done it in a couple days, though, with a Ruby script. If you don't have someone with those skills, I'm sure you could hire someone (or yours truly) for just a few hours to do it.

Jason
I'd second the notion that depending on how recently those pages were created, you could be bringing along font tags and the associated "cruft."

I'm working on something similar, parsing through HTML and producing an XML markup used by our new CMS. I just want to get stuff in. My tool of choice is PERL given all the pattern matching abilities of that language.
Stephanie Leary built a really great plugin for WordPress that does just this. You could pull your static content into a WP install and export to XML. Might be a quick and easy way to accomplish the task.
Hi,

Just curious, what kind of CMS will you be using?

Pem
dotCMS is the frontrunner at the moment.
I've helped with a few such migrations.Getting the data areas themselves aren't hard. A good developer could write a script for it rather easily. It's the formatting, accessibility, etc that is a problem especially with DW where there tends to be a lot of Garbage code. I've tried plugins and other methods, but in the end your best bet might be something on the lines of a student worker and a text editor. Anything else can cause all sorts of extra problems down the line.
Hi Stephen,

At NNU, we just accomplished content migration from static HTML files (Dreamweaver) into Typo3 using a custom extension we built. I'd be happy to let you look at the code and see if you can use it. You might also consider Typo3 as a CMS. http://typo3.org.

How big is the site you are looking to migrate?

Best,

Zac
Stephen,

How deeply have you delved into this? Have you thought about site structure? Is that going to change at all? What about any attachments like PDFs and images that are part of the content?

The Typo3 extension we built not only imports the HTML files into the db, it also imports any files, such as PDF (you specify file types), rebuilds the links to work with the CMS, and if you have Dreamweaver template tags around your content it can target only that area.

Of course there is still some manual editing we are doing, but I processed at 35k+ file site in less than an hour and have only needed to make relatively small tweaks on what was imported (not counting actual restructuring)

The extension is still a little rough, but fully workable. We were planning on releasing it back to the Typo3 community after a little cleanup, but if you would like to take a gander at it please feel free to talk to Zac or myself
I have a BBEdit Text Factory that does this. If you're on a Mac, you can download BBEdit from barebones.com (free for 30 days), go to the menu that looks like a gear, open the Text Factories folder, and then put in the file below (once you've extracted it from the .zip archive). Once you put it into the folder, you can run the Factory over a whole folder/site full of documents.

http://www.uri.edu/news/hicks/uwebd_clean_DW.textfactory.zip
I would look into YQL (Yahoo Query Library). You can build your own query that delivers either an XML tree or JSON structure then parse that query with any language. I've never used in the matter you are looking to do but it's worth a shot.
Well, it doesn't look like it will do the job en masse but DW CS5 can export the template data as an XML document. You can export the XML two ways, one where the editable sections names as the XML tags and one that uses DW XML tags.

I recently built a migration script as a proof of concept that my college could make the change to Drupal pretty seamlessly.

 

Our current CMS (reddot) has the ability to output valid XHTML files. I built a script in classic ASP script (am currently porting it to PHP) that scraped these XHTML files using Xpath and output them into a CSV file. I just made sure to consistently name elements (e.g. - <div id="header">) so that Xpath could find them.

RSS

Elsewhere

Latest Activity

Linda Faciana commented on Lynn Zawie's group OmniUpdate
"Are online forms more efficient? Learn how El Camino College used Formstack to create online forms that expedited processing, improved communications, increased transparency, and promoted accountability across campus. http://bit.ly/2zhdcIt"
yesterday
Sara Arnold commented on Lynn Zawie's group OmniUpdate
"It's important to understand the science behind your web pages to better engage and ultimately attract prospective students to your site. http://bit.ly/2ZYK8FZ"
Sep 12
Linda Faciana commented on Lynn Zawie's group OmniUpdate
"If you’re struggling with web challenges such as accessibility, SEO, design consistency, workflow, content governance, or how to start a website redesign, you’re not alone. Join our next webcast to learn how other higher ed institutions…"
Sep 5
Sara Arnold commented on Lynn Zawie's group OmniUpdate
"eQAfy confirms that OU Campus is still the #1 commercial CMS for colleges and universities in the United States. http://bit.ly/2Lir9Mn"
Aug 28
Sara Arnold commented on Lynn Zawie's group OmniUpdate
"Here’s an outline of everything you need to know about OCR compliance, including what it is, what your college or university can do to stay compliant, and resources for OCR compliance. #accessibility http://bit.ly/2rcPDgG"
Aug 23
Linda Faciana commented on Lynn Zawie's group OmniUpdate
"Join us for our next webcast with April Buscher from Montana State University Billings to learn how blind readers and people with hearing impairment view and read your website and how you can make it accessible to them. http://bit.ly/2zhdcIt"
Aug 14
Amanda Lawson joined Lynn Zawie's group
Thumbnail

OmniUpdate

Share your experiences using OmniUpdate CMS
Aug 9
Amanda Lawson posted a photo

Amanda Lawson

Amanda Lawson, Web Content ManagerCommunity College of Allgheny County
Aug 9
Sara Arnold commented on Lynn Zawie's group OmniUpdate
"High schoolers spend more time on their digital devices than they do sleeping, doing homework, or participating in extracurricular activities. So how do you make your message stand out to them? #eexpect http://bit.ly/2MOIIWC"
Aug 8
Linda Faciana commented on Lynn Zawie's group OmniUpdate
"Want to increase digital engagement with high school juniors and seniors? Join our next webcast with Stephanie Geyer from Ruffalo Noel Levitz as she shares new data from the 2019 E-Expectations Trend Report on email, paid media, and social media…"
Jul 31
Charlie Holder joined DNI's group
Thumbnail

Cascade Server CMS

For folks who use (or are interested in) Hannon Hill's Cascade Server CMS productSee More
Jul 26
Linda Faciana commented on Lynn Zawie's group OmniUpdate
"Is your website in compliance with the new WCAG 2.1? Join our webcast to learn various accessibility guidelines, what’s new in 2.1, and more! http://bit.ly/2zhdcIt"
Jul 22
Sara Arnold commented on Lynn Zawie's group OmniUpdate
"Even though GDPR has been in effect for over a year, many U.S. colleges and universities are still struggling with how best to implement the rules. We’re here to help. http://bit.ly/2YZZtRQ"
Jul 18
Sara Arnold commented on Lynn Zawie's group OmniUpdate
"Does your college or university website meet the new WCAG 2.1 accessibility standards? http://bit.ly/2JBXD3s"
Jul 12
Linda Faciana commented on Lynn Zawie's group OmniUpdate
"Join us for our next webcast with Eric Turner from Mt. San Antonio College, who will share easy steps to make your website GDPR compliant. http://bit.ly/2zhdcIt"
Jul 10
Linda Faciana commented on Lynn Zawie's group OmniUpdate
"It is always important to make a good first impression! Join Aaron Blau from Converge Consulting as he covers ways to make your web content attractive to your target audience and create an authentic brand message. http://bit.ly/2zhdcIt"
Jun 19
Jon Shaw posted a discussion

email obfuscation

Anyone using a javascript or php email obfuscation library that is effective for spam defense?See More
Jun 11
Linda Faciana commented on Lynn Zawie's group OmniUpdate
"Join us for our next webcast with Kelly Bostick from University of Arkansas who will provide some great tips on ways to ensure that all of your digital content is accessible. http://bit.ly/2zhdcIt"
Jun 6
Sara Arnold commented on Lynn Zawie's group OmniUpdate
"Creating and producing website content is just the tip of the iceberg. In our latest white paper, learn how to manage that content to help your website reach its fullest marketing and recruiting potential. http://bit.ly/30WJ0PW"
May 30
Sara Arnold commented on Lynn Zawie's group OmniUpdate
"A college or university website redesign is the most effective and cost-efficient way to attract and recruit new students. Download our ultimate guide to get started on your redesign today! http://bit.ly/30MmcSQ"
May 28

UWEBD has been in existence for more than 10 years and is the very best email discussion list on the Internet, in any industry, on any topic

About

© 2019   Created by Mark Greenfield.   Powered by

Badges  |  Report an Issue  |  Terms of Service