URL to PDF Kindle Forwarder
URL to PDF Kindle Forwarder
Problem
My dad likes to read articles on his kindle as it is a better reading experience. To be able to access the pdf of the article he had to go the page, download the page as a pdf and then email it to the kindle's email. A standard workflow was to email the urls to one email and then save the page's as pdfs and email them in one batch.
Flow
The flow this project needed was to receive an email with a url and then email a pdf of the page to the kindle's email. This required invoking a lambda from recieving an email.
AWS architecture
I used TypeScript to use CDK for my IaC. This was super helpful for creating development
and production
environments.
On previous projects I have used GoLang for my CDK but I preferred using TypeScript. I find using jsii
types awkward in non TypeScript CDK projects.
I used Rust for the AWS Lambda.
Below it the AWS architecture diagram:
After a PDF of the article page is created, the lambda uses AWS SES to email the pdf as an attachment to the kindle.
Invoking lambda by receiving an email
This was a big part of the project's flow so this part was crucial.
I used a rule set in AWS SES to invoke my lambda when it received an email from example@myprojectdomain.oliverlooney.com
. To get this to work I had to create an MX record in Route53 for my myprojectdomain.oliverlooney.com
. This MX record had to point to the region the SES rule is in.
Below is my CDK code for creating the rule set:
1 const sesInvokeLambdaRole = new iam.Role(this, 'SesInvokeLambdaRole', { 2 assumedBy: new iam.ServicePrincipal('ses.amazonaws.com'), 3 }); 4 5 emailForwardingLambda.grantInvoke(sesInvokeLambdaRole); 6 7 const ruleSet = new ses.ReceiptRuleSet(this, getNameForEnv('RuleSet', props.environmentName), { 8 receiptRuleSetName: getNameForEnv('EmailForwardingRuleSet', props.environmentName), 9 }); 10 11 new ses.ReceiptRule(this, getNameForEnv('EmailForwardingRule', props.environmentName), { 12 ruleSet, 13 recipients: [props.invokingEmail], 14 actions: [ 15 new sesActions.Lambda({ 16 function: emailForwardingLambda, 17 invocationType: sesActions.LambdaInvocationType.EVENT, 18 }), 19 ], 20 });
First attempt - Lambda layers
My first attempt used a cli tool called wkhtmltopdf. Since this is a cli tool I had to use lambda layers for my lambda to use it. Fortuanatly wkhtmltopdf
has a download option for lambda layers that I was able to use.
The first problem was that the default memory for the lambda (128MB) caused the CPU for my lambda to not be powerful enough for the program. The lambda errored with the wkhtmltopdf
having a timeout error.
I bumped up the memory to increase the CPU to test it again, but for the articles I was looking at they did not come out right. The full article was not in the pdf and their were pop ups covering text.
Second attempt - PDFShift API
The next tool I used was docs.pdfshift.io. This was a lot simpler to use as it did not require a lambda layers and just a simple api call. It does require an api key, so this was a good opportunity to use AWS secretsmanager in a side project.
For getting rid of pop ups, pdfshift
has an option for sending a string of javascript code that will be executed on their side before the pdf is saved. So I grabbed the selector for the 'consent' button on the pop up and passed in javascript code to click that button.
This worked well and I got usable PDFs!
After settling on using this api, I decided to use lambda power tuning to figure out what memory setting to use. Unsurprisngly since the main work was being handled by pdfshift
the memory didn't make any real difference. I chose 512MB
. But since this function will only be used 50-150 ish times a month, the cost isn't a real consideration for this project.
Future plans
One feature request is to organise the PDFs with 'tags'. The kindle has a feature for the user to set tags and filter by them. I am looking at using an llm to read the pdf and then assign/come up with tags.