These list items are microformat entries and are hidden from view.
https://dltj.org/article/fixing-webmentions/
Peter Murray
Okay, a half-step backward to fix something I broke yesterday. As I described earlier this year, this static website blog uses the Webmention protocol to notify others when I link to their content and receive notifications from others. Behind the scenes, I’m using the Jekyll plugin called jekyll-webmention_io to integrate Webmention data into my blog’s content. Each time the contents of this site is built, that plug-in contacts the Webmention.IO service to receive its Webmention data. (Webmention.IO holds onto it between Jekyll builds since there is no always-on “dltj.org” server to receive notifications from others.) The plug-in caches that information to ease the burden on the Webmention.IO service.The previous CloudFormation-based process was using AWS CodeBuild natively, and the Webmention cache was stored in CodeBuild’s caching function.CodeBuild automatically downloads the previous cache into the working directory for each build iteration and then automatically uploads the cache as the build is completed. Handy, right?Well, AWS Amplify simplifies some of the setup of working with the underlying CodeBuild tool. One of the configuration options that is no longer available is the ability to specify which S3 bucket to use as the CodeBuild cache; so I couldn’t point it at the previous cache files and all of the previous Webmention entries no longer appeared on the blog pages. Fortunately, I hadn’t decommissioned the CloudFormation stuff, so I still had access to the old cache; I was able to extract the four webmention files (but see below for a discussion about that).Since Amplify doesn’t allow me to have direct access to the CodeBuild cache, I decided it was high time to use a dedicated cache location for these webmention files. To do that took three steps:1. Create the S3 bucket (with no public access)2. Add read/write policy for that bucket to the AWS role assigned to the Amplify app3. Add lines to the amplify.yml file to copy files from the S3 bucket into and out of the working directoryFor step 2, the IAM policy for the Amplify role:{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "s3:DeleteObject", "s3:PutObject", "s3:GetObject", "s3:ListBucket" ], "Resource": "arn:aws:s3:::org.dltj.webmentions-cache" }, { "Sid": "VisualEditor1", "Effect": "Allow", "Action": [ "s3:ListAllMyBuckets" ], "Resource": "*" } ]}For the amplify.yml file:version: 1frontend: phases: preBuild: commands: - aws s3 cp s3://org.dltj.webmentions-cache webmentions-cache --recursive - rvm use $VERSION_RUBY_2_6 - bundle install --path vendor/bundle build: commands: - rvm use $VERSION_RUBY_2_6 - bundle exec jekyll build --trace postBuild: commands: - aws s3 cp webmentions-cache s3://org.dltj.webmentions-cache --recursive artifacts: baseDirectory: _site files: - '**/*' cache: paths: - 'vendor/**/*'And the webmentions part of the Jekyll _config.yml file:webmentions: cache_folder: webmentions-cacheContents of the AWS CodeBuild Cache FileCan we do a quick sidebar on the AWS CodeBuild caching mechanism? Because I was not expecting what I saw.The CodeBuild cache S3 bucket contains one file with a UUID as its name. That file is a tar-gzip’d archive of a flat directory containing sequentially numbered files (0 through 8099 in my case) and a codebuild.json table of contents:{ "version": "1.0", "content": { "files": [ { "path": "vendor/s3deploy.tar.gz", "rel": "src" }, { "path": "vendor/s3deploy", "rel": "src" }, { "path": "vendor/LICENSE", "rel": "src" }, { "path": "vendor/README.md", "rel": "src" }, { "path": "vendor/webmentions", "rel": "src" }, { "path": "vendor/webmentions/received.yml", "rel": "src" }, { "path": "vendor/webmentions/lookups.yml", "rel": "src" }, { "path": "vendor/webmentions/bad_uris.yml", "rel": "src" }, { "path": "vendor/webmentions/outgoing.yml", "rel": "src" }, ...Each item in the files array corresponded to the numbered filename in the directory. (In the case of the 4th item in the array—a directory—there was no corresponding file in the tar-gzip archive.) Fortunately, the four files I was looking for were near the top of the list and I didn’t have to go hunting through all eight-thousand-some-odd files to find them.(The s3deploy program is one that I found to intelligently copy modified files from the CodeBuild working directory to the S3 static website bucket.)I’m really wondering about the engineering requirements for all of this overhead. Why not just use a native tar-gzip archive without the process of parsing the table of contents and renaming the files?
2021-12-31T00:00:00+00:00
2024-07-20T16:35:17+00:00

Refactoring DLTJ, Winter 2021 Part 2.5: Fixing the Webmentions Cache

Posted on December 31, 2021 and updated on July 20, 2024 3 minute read

Okay, a half-step backward to fix something I broke yesterday. As I described earlier this year, this static website blog uses the Webmention protocol to notify others when I link to their content and receive notifications from others. Behind the scenes, I’m using the Jekyll plugin called jekyll-webmention_io to integrate Webmention data into my blog’s content. Each time the contents of this site is built, that plug-in contacts the Webmention.IO service to receive its Webmention data. (Webmention.IO holds onto it between Jekyll builds since there is no always-on “dltj.org” server to receive notifications from others.) The plug-in caches that information to ease the burden on the Webmention.IO service.

The previous CloudFormation-based process was using AWS CodeBuild natively, and the Webmention cache was stored in CodeBuild’s caching function. CodeBuild automatically downloads the previous cache into the working directory for each build iteration and then automatically uploads the cache as the build is completed. Handy, right?

Well, AWS Amplify simplifies some of the setup of working with the underlying CodeBuild tool. One of the configuration options that is no longer available is the ability to specify which S3 bucket to use as the CodeBuild cache; so I couldn’t point it at the previous cache files and all of the previous Webmention entries no longer appeared on the blog pages. Fortunately, I hadn’t decommissioned the CloudFormation stuff, so I still had access to the old cache; I was able to extract the four webmention files (but see below for a discussion about that).

Since Amplify doesn’t allow me to have direct access to the CodeBuild cache, I decided it was high time to use a dedicated cache location for these webmention files. To do that took three steps: 1. Create the S3 bucket (with no public access) 2. Add read/write policy for that bucket to the AWS role assigned to the Amplify app 3. Add lines to the amplify.yml file to copy files from the S3 bucket into and out of the working directory

For step 2, the IAM policy for the Amplify role:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:DeleteObject",
                "s3:PutObject",
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": "arn:aws:s3:::org.dltj.webmentions-cache"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "s3:ListAllMyBuckets"
            ],
            "Resource": "*"
        }
    ]
}

For the amplify.yml file:

version: 1
frontend:
  phases:
    preBuild:
      commands:
        - aws s3 cp s3://org.dltj.webmentions-cache webmentions-cache --recursive
        - rvm use $VERSION_RUBY_2_6
        - bundle install --path vendor/bundle
    build:
      commands:
        - rvm use $VERSION_RUBY_2_6
        - bundle exec jekyll build --trace
    postBuild:
      commands:
        - aws s3 cp webmentions-cache s3://org.dltj.webmentions-cache --recursive
  artifacts:
    baseDirectory: _site
    files:
      - '**/*'
  cache:
    paths:
      - 'vendor/**/*'

And the webmentions part of the Jekyll _config.yml file:

webmentions:
  cache_folder: webmentions-cache

Contents of the AWS CodeBuild Cache File

Can we do a quick sidebar on the AWS CodeBuild caching mechanism? Because I was not expecting what I saw. The CodeBuild cache S3 bucket contains one file with a UUID as its name. That file is a tar-gzip’d archive of a flat directory containing sequentially numbered files (0 through 8099 in my case) and a codebuild.json table of contents:

{
  "version": "1.0",
  "content": {
    "files": [
      {
        "path": "vendor/s3deploy.tar.gz",
        "rel": "src"
      },
      {
        "path": "vendor/s3deploy",
        "rel": "src"
      },
      {
        "path": "vendor/LICENSE",
        "rel": "src"
      },
      {
        "path": "vendor/README.md",
        "rel": "src"
      },
      {
        "path": "vendor/webmentions",
        "rel": "src"
      },
      {
        "path": "vendor/webmentions/received.yml",
        "rel": "src"
      },
      {
        "path": "vendor/webmentions/lookups.yml",
        "rel": "src"
      },
      {
        "path": "vendor/webmentions/bad_uris.yml",
        "rel": "src"
      },
      {
        "path": "vendor/webmentions/outgoing.yml",
        "rel": "src"
      },
    ...

Each item in the files array corresponded to the numbered filename in the directory. (In the case of the 4th item in the array—a directory—there was no corresponding file in the tar-gzip archive.) Fortunately, the four files I was looking for were near the top of the list and I didn’t have to go hunting through all eight-thousand-some-odd files to find them. (The s3deploy program is one that I found to intelligently copy modified files from the CodeBuild working directory to the S3 static website bucket.)

I’m really wondering about the engineering requirements for all of this overhead. Why not just use a native tar-gzip archive without the process of parsing the table of contents and renaming the files?

Social Media Interactions

No reposts were found.

No likes were found.

Discussion

Mike Tⓐylor 🏴󠁧󠁢󠁥󠁮󠁧󠁿 🇬🇧 🇪🇺

I have to ask … Why do you go to all this work instead of just hosting your blog on WordPress.com?

31 December 2021 | Permalink
Peter Murray

Fair question. Partially because I’m learning as I go, and that is rewarding. Partially because I want to be self-reliant (acknowledging the irony that I’m leaning heavily on AWS). Partially because sharing all of these small findings with the world is fun.

31 December 2021 | Permalink

Share on

Mastodon/Fediverse Twitter Facebook LinkedIn

Peter Murray

Refactoring DLTJ, Winter 2021 Part 2.5: Fixing the Webmentions Cache

Contents of the AWS CodeBuild Cache File

Social Media Interactions

Discussion

Share on

You may also enjoy

Ghost Newsletter Software Findings: Got Past the Mailgun Problem, but Got Stuck On Ugly HTML

Digital versus Digitized: On the Hachette v. Internet Archive Appeal Oral Argument

The ILS without patron data: open questions

The ILS without patron data: a thought experiment realized with FOLIO

Likes

Reposts

Discussion