Wordpress to Eleventy
I was self hosting hosting a blog 'www.planetmediocrity.com' and wanted to make it static and port it to Eleventy.
I exported all content as an XML file and then used wp2md to convert the large xml file into individula .md files.
Unfortunately, these were not in the correct format for Eleventy, so I threw together the Powershell script below to convert the headers...
$workdir = "C:\Users\me\11ty\convert\posts"
$outdir = "C:\Users\me\11ty\convert\posts-output\"
$files = Get-ChildItem "$workdir\*.*" -Include *.md
$separator = "---"
$layout = "layouts/pmpost.njk"
Foreach ($file in $files){
$text = Get-Content $file
#Get the filename and set output dir
$filename = $file.Name
$filename = $outdir + $filename
#Get the title
$title = $text | select-string -Pattern 'title: '
$title = $title.ToString()
$title = $title.TrimStart("title: ")
# Set the title as the description since there are none
$description = $title
# Get the date
$date = $text | Select-String -Pattern 'created: '
$date = $date.ToString()
$date = $date.TrimStart("created: ")
# convert to correct format
$date = $date -replace "/", '-'
# Get the body of the post by skipping the first 11 lines
$body = $text | select -Skip 11
# Write the file to the output directory
Write-Host "Writing to:" $filename
# I should add some replaces to fix images but I ran out of time
# $body = $body -replace "REGEX TO MATCH WP FORMAT","<img src="../../img/FILENAME FROM REGEX" alt="ALT FROM REGEX" /></br>
Add-Content $filename "$separator"
Add-Content $filename "title: $title"
Add-Content $filename "description: $description"
Add-Content $filename "date: $date"
Add-Content $filename "tags:"
Add-Content $filename " - planetmediocrity"
Add-Content $filename ""
Add-Content $filename "layout: $layout"
Add-Content $filename "$separator"
Add-Content $filename "$body"
Write-Host "File written"
}
I'm tidying up the body and relinking images etc. manually.
Update 13/04/2019: I noticed that all my images were followed by a </br>
tag which is just plain wrong! I made this mistake when manually fixing broken images after running the above script. I decided to convert all the HTML <img>
tags to Markdown using Notepad++ to remove the requirement for a line break and to reduce the amoun of HTML in my source files.
After doing a couple manually, I decided to write the regex that I should have included in my original script. This will take the alt text from the original link and add it to the markdown version.
From:
<img src="../../img/wpid-Screenshot_2013-01-12-17-51-21_0.jpg" alt="Shark in the meadows!" /></br>
To:

In Notepad++ find and replace, I used the following regex.
Find what :
<img src="(.*)?".*"(.*)?" \/><\/br>
Replace with:
![\2]\(\1 "\2"\)
This finds a string that starts with <img src="
followed by 0 or more characters up until the next "
, then 0 or more characters ending in another "
then 0 or more characters before " /></br>
The sections in brackets ( )
are stored as \1
and \2
respectively.
These are then used in the replace command to enter the file path and alt text.