University Web Developers

University Web Developers

We're in the process of moving from static HTML pages (Dreamweaver templates) to a content management system for our main university web site.  A lot of CMSes have import utilities that can use CSV (Comma Separated Value) files or XML.

I am looking for a way to grab all of the data on our old pages and put them into one giant CSV file (or at least a folder at a time) for bulk import into our new site.  Finding the content areas on our pages would be pretty easy, since they're already clearly marked in the code (ie. "<!-- InstanceBeginEditable name="Text" -->").

Does anyone have experience doing something like this?  Something like Mozenda (http://www.mozenda.com/) looks promising, but I'd prefer something that is a little cheaper or at least doesn't charge by the page.

Views: 1465

Reply to This

Replies to This Discussion

I'm nearing the end of a 5-month CMS migration from Dreamweaver. I did it by hand (read: work-study students) because the HTML in Dreamweaver had 10 years of cruft built up and I didn't want that in my shiny new CMS. I could have done it in a couple days, though, with a Ruby script. If you don't have someone with those skills, I'm sure you could hire someone (or yours truly) for just a few hours to do it.

Jason
I'd second the notion that depending on how recently those pages were created, you could be bringing along font tags and the associated "cruft."

I'm working on something similar, parsing through HTML and producing an XML markup used by our new CMS. I just want to get stuff in. My tool of choice is PERL given all the pattern matching abilities of that language.
Stephanie Leary built a really great plugin for WordPress that does just this. You could pull your static content into a WP install and export to XML. Might be a quick and easy way to accomplish the task.
Hi,

Just curious, what kind of CMS will you be using?

Pem
dotCMS is the frontrunner at the moment.
I've helped with a few such migrations.Getting the data areas themselves aren't hard. A good developer could write a script for it rather easily. It's the formatting, accessibility, etc that is a problem especially with DW where there tends to be a lot of Garbage code. I've tried plugins and other methods, but in the end your best bet might be something on the lines of a student worker and a text editor. Anything else can cause all sorts of extra problems down the line.
Hi Stephen,

At NNU, we just accomplished content migration from static HTML files (Dreamweaver) into Typo3 using a custom extension we built. I'd be happy to let you look at the code and see if you can use it. You might also consider Typo3 as a CMS. http://typo3.org.

How big is the site you are looking to migrate?

Best,

Zac
Stephen,

How deeply have you delved into this? Have you thought about site structure? Is that going to change at all? What about any attachments like PDFs and images that are part of the content?

The Typo3 extension we built not only imports the HTML files into the db, it also imports any files, such as PDF (you specify file types), rebuilds the links to work with the CMS, and if you have Dreamweaver template tags around your content it can target only that area.

Of course there is still some manual editing we are doing, but I processed at 35k+ file site in less than an hour and have only needed to make relatively small tweaks on what was imported (not counting actual restructuring)

The extension is still a little rough, but fully workable. We were planning on releasing it back to the Typo3 community after a little cleanup, but if you would like to take a gander at it please feel free to talk to Zac or myself
I have a BBEdit Text Factory that does this. If you're on a Mac, you can download BBEdit from barebones.com (free for 30 days), go to the menu that looks like a gear, open the Text Factories folder, and then put in the file below (once you've extracted it from the .zip archive). Once you put it into the folder, you can run the Factory over a whole folder/site full of documents.

http://www.uri.edu/news/hicks/uwebd_clean_DW.textfactory.zip
I would look into YQL (Yahoo Query Library). You can build your own query that delivers either an XML tree or JSON structure then parse that query with any language. I've never used in the matter you are looking to do but it's worth a shot.
Well, it doesn't look like it will do the job en masse but DW CS5 can export the template data as an XML document. You can export the XML two ways, one where the editable sections names as the XML tags and one that uses DW XML tags.

I recently built a migration script as a proof of concept that my college could make the change to Drupal pretty seamlessly.

 

Our current CMS (reddot) has the ability to output valid XHTML files. I built a script in classic ASP script (am currently porting it to PHP) that scraped these XHTML files using Xpath and output them into a CSV file. I just made sure to consistently name elements (e.g. - <div id="header">) so that Xpath could find them.

RSS

Elsewhere

Latest Activity

Sara Kisseberth posted a discussion

Archived magazine stories

Greetings,What are you all doing online with "old" magazine stories? Do you delete issues after so  many years? 5 years? 10? I'm torn between keeping all on for historical purposes or keeping just a few years online to simplify the site (ala Gerry McGovern.) Curious as to what you see best practices being.ThanksSara KisseberthBluffton Universitywww.bluffton.eduSee More
Jun 10
Erin Jorgensen posted a discussion

HighEdWeb 2020 Accessibility Summit

The HighEdWeb 2020 Accessibility Summit is a one-day, online conference about digital accessibility in higher education happening June 25, 2020, from 10 a.m. to 5 p.m. CDT.Join in to learn best practices, share stories and connect with your higher ed peers on topics including social media accessibility, web development, user experience and more. Sessions are designed to boost knowledge at every level, from accessibility beginners to technical experts. Conference registration is $25, with…See More
May 29
Erin Jorgensen is now a member of University Web Developers
May 29
Christine Boehler posted a discussion

HighEdWeb 2020 Annual Conference - ONLINE

October 19-20, 2020https://2020.highedweb.org/#HEWeb20     Join us ONLINE for HighEdWeb 2020, the conference created by and for higher education professionals across all departments and divisions. Together we explore and find solutions for the unique issues facing digital teams at colleges and universities. In 2020, the Conference will be held completely online, offering multiple tracks of streamed presentations, live…See More
May 3
Sara Arnold commented on Lynn Zawie's group OmniUpdate
"Throughout April, we're hosting webcasts exploring how colleges and universities across North America are responding to the COVID-19 pandemic. Register for the series today! https://bit.ly/2xsXhK9"
Apr 13
Christelle Lachapelle is now a member of University Web Developers
Apr 6
Sara Arnold commented on Lynn Zawie's group OmniUpdate
"Download our latest white paper to learn how the demographics of today’s higher ed learners are shifting, and how schools can adapt to meet the needs of these new learners. https://bit.ly/2wTKdgB"
Mar 31
Sara Arnold commented on Lynn Zawie's group OmniUpdate
"Join our next webcast with Amrit Ahluwalia from The EvoLLLution to learn about the new "modern learner" in higher education. https://bit.ly/2UuDh2I"
Mar 30
Sara Arnold commented on Lynn Zawie's group OmniUpdate
"As we ride out the latest developments and impact of the coronavirus, there's no better time than now to learn the three Bs of crisis planning. http://bit.ly/2ITVkc2"
Mar 16
Sara Arnold commented on Lynn Zawie's group OmniUpdate
"Is your college or university prepared to meet the challenges that come with disasters and emergencies like the coronavirus? Learn how your CMS can help. http://bit.ly/2TUZUM8"
Mar 12
Sara Arnold commented on Lynn Zawie's group OmniUpdate
"Can’t afford the time and money to launch a comprehensive guided pathways model? Register for our FREE webcast to learn tricks for simulating a digital guided pathways experience."
Feb 21
Sara Arnold commented on Lynn Zawie's group OmniUpdate
"With college enrollment decreasing for the 8th year in a row, boosting your college or university marketing efforts is more important than ever. Here's how to get started. http://bit.ly/2vTQAzz"
Feb 20
Christine Boehler posted a discussion

HighEdWeb 2020 Annual Conference

October 18-21, 2020 in Little Rock, Arkansas, USAhttps://2020.highedweb.org/#HEWeb20     Join us for HighEdWeb 2020, the conference created by and for higher education professionals across all departments and divisions. Together we explore and find solutions for the unique issues facing digital teams at colleges and universities. With 100+ diverse sessions, an outstanding keynote presentation, intensive workshops, and engaging networking events,…See More
Feb 19
Christine Boehler posted a discussion

HighEdWeb 2020 Call for Proposals is Open!

The 2020 Annual Conference of the Higher Education Web Professionals Association (HighEdWeb) will travel to Little Rock, Arkansas, this October 18-21 — and the call for proposals is now open! As a digital professional in higher education, we know you have great ideas and experiences to share. From developers, marketers and programmers to managers, designers, writers and all team members in-between, HighEdWeb provides valuable professional development for all who want to explore the unique…See More
Feb 14
Christine Boehler shared Sara Clark's discussion on Facebook
Feb 14
Christine Boehler is now a member of University Web Developers
Feb 14
Brian Bell joined Kevin Daum's group
Feb 14
Brian Bell joined Mark Greenfield's group
Feb 14
Kenneth George is now a member of University Web Developers
Feb 13
John Sterni is now a member of University Web Developers
Feb 6

UWEBD has been in existence for more than 10 years and is the very best email discussion list on the Internet, in any industry, on any topic

About

© 2020   Created by Mark Greenfield.   Powered by

Badges  |  Report an Issue  |  Terms of Service