Import Markdown Python

Python-Markdown provides two public functions (markdown and markdownFromFile) both of which wrap the public class Markdown.If you’re processing one document at a time, the functions will serve your needs. However, if you need to process multiple documents, it may be advantageous to create a single instance of the class: ` Markdown class and pass multiple documents through it. This script turns Markdown into HTML using the Python markdown library and wraps the result in a complete HTML document with default Bootstrap styling so that it's immediately printable. Requires the python libraries jinja2, markdown, and mdxsmartypants. With the md-in-html extension enabled, the content of a raw HTML block-level element can be parsed as Markdown by including a markdown attribute on the opening tag. The markdown attribute will be stripped from the output, while all other attributes will be preserved. The markdown attribute can be assigned one of three values: '1', 'block', or 'span'. Convert Markdown to HTML in Python The easiest way to convert is just use a string for input and a string for output. Import markdown # Simple conversion in memory mdtext = '# Hellonn.Text.' html = markdown.markdown(mdtext) print(html) To use files for input and output instead.

As developers, we rely on static analysis tools to check, lint and transform our code. We use these tools to help us be more productive and produce better code. However, when we write content using markdown the tools at our disposal are scarce.

In this article we describe how we developed a Markdown extension to address challenges in managing content using Markdown in Django sites.

Table of Contents

The Problem
Using Markdown
Validate and Transform Django Links
- Handling Internal and External Links
Conclusion

The Problem

Like every website, we have different types of (mostly) static content in places like our home page, FAQ section and 'About' page. For a very long time, we managed all of this content directly in Django templates.

When we finally decided it's time to move this content out of templates and into the database, we thought it's best to use Markdown. It's safer to produce HTML from Markdown, it provides a certain level of control and uniformity, and is easier for non-technical users to handle. As we progressed with the move, we noticed we are missing a few things:

Internal Links

Links to internal pages can get broken when the URL changes. In Django templates and views we use reverse and {% url %}, but this is not available in plain Markdown.

Copy Between Environments

Absolute internal links cannot be copied between environments. This can be resolved using relative links, but there is no way to enforce this out of the box.

Invalid Links

Invalid links can harm user experience and cause the user to question the reliability of the entire content. This is not something that is unique to Markdown, but HTML templates are maintained by developers who know a thing or two about URLs. Markdown documents on the other hand, are intended for non-technical writers.

Prior Work

When I was researching this issue I searched for Python linters, Markdown preprocessor and extensions to help produce better Markdown. I found very few results. One approach that stood out was to use Django templates to produce Markdown documents.

Preprocess Markdown using Django Template

Using Django templates, you can use template tags such as url to reverse URL names, as well as conditions, variables, date formats and all the other Django template features. This approach essentially uses Django template as a preprocessor for Markdown documents.

I personally felt like this may no be the best solution for non-technical writers. In addition, I was worried that providing access to Django template tags might be dangerous.

Using Markdown

With a better understanding of the problem, we were ready to dig a bit deeper into Markdown in Python.

Converting Markdown to HTML

To start using Markdown in Python, install the markdown package:

Next, create a Markdown object and use the function convert to turn some Markdown into HTML:

You can now use this HTML snippet in your template.

Using Markdown Extensions

The basic Markdown processor provides the essentials for producing HTML content. For more 'exotic' options, the Python markdown package includes some built-in extensions. A popular extension is the 'extra' extension that adds, among other things, support for fenced code blocks:

To extend Markdown with our unique Django capabilities, we are going to develop an extension of our own.

Creating a Markdown Extension to Process Inline Links

Python Markdown Example

If you look at the source, you'll see that to convert markdown to HTML, Markdown uses different processors. One type of processor is an inline processor. Inline processors match specific inline patterns such as links, backticks, bold text and underlined text, and converts them to HTML.

The main purpose of our Markdown extension is to validate and transform links. So, the inline processor we are most interested in is the LinkInlineProcessor. This processor takes markdown in the form of [Haki's website](https://hakibenita.com), parses it and returns a tuple containing the link and the text.

To extend the functionality, we extend LinkInlineProcessor and create a Markdown.Extension that uses it to handle links:

Let's break it down:

The extension DjangoUrlExtension registers an inline link processor called DjangoLinkInlineProcessor. This processor will replace any other existing link processor.
The inline processor DjangoLinkInlineProcessor extends the built-in LinkInlineProcessor, and calls the function clean_link on every link it processes.
The function clean_link receives a link and a domain, and returns a transformed link. This is where we are going to plug in our implementation.

How to get the site domain

To identify links to your own site you must know the domain of your site. If you are using Django's sites framework you can use it to get the current domain.

I did not include this in my implementation because we don't use the sites framework. Instead, we set a variable in Django settings.

Another way to get the current domain is from an HttpRequest object. If content is only edited in your own site, you can try to plug the site domain from the request object. This may require some changes to the implementation.

To use the extension, add it when you initialize a new Markdown instance:

Great, the extension is being used and we are ready for the interesting part!

Validate and Transform Django Links

Now that we got the extension to call clean_link on all links, we can implement our validation and transformation logic.

Validating `mailto` Links

To get the ball rolling, we'll start with a simple validation. mailto links are useful for opening the user's email client with a predefined recipient address, subject and even message body.

A common mailto link can look like this:

This link will open your email client set to compose a new email to 'support@service.com' with subject line 'I need help!'.

mailto links do not have to include an email address. If you look at the 'share' buttons at the bottom of this article, you'll find a mailto link that looks like this:

Markdown Import File

This mailto link does not include a recipient, just a subject line and message body.

Now that we have a good understanding of what mailto links look like, we can add the first validation to the clean_link function:

To validate a mailto link we added the following code to clean_link:

Check if the link starts with mailto: to identify relevant links.
Split the link to its components using a regular expression.
Yank the actual email address from the mailto link, and validate it using Django's EmailValidator.

Notice that we also added a new type of exception called InvalidMarkdown. We defined our own custom Exception type to distinguish it from other errors raised by markdown itself.

Custom error class

I wrote about custom error classes in the past, why they are useful and when you should use them.

Before we move on, let's add some tests and see this in action:

Great! Worked as expected.

Handling Internal and External Links

Now that we got our toes wet with mailto links, we can handle other types of links:

External Links

Links outside our Django app.
Must contains a scheme: either http or https.
Ideally, we also want to make sure these links are not broken, but we won't do that now.

Internal Links

Links to pages inside our Django app.
Link must be relative: this will allow us to move content between environments.
Use Django's URL names instead of a URL path: this will allow us to safely move views around without worrying about broken links in markdown content.
Links may contain query parameters (?) and a fragment (#).

SEO

From an SEO standpoint, public URL's should not change. When they do, you should handle it properly with redirects, otherwise you might get penalized by search engines.

With this list of requirements we can start working.

Resolving URL Names

To link to internal pages we want writers to provide a URL name, not a URL path. For example, say we have this view:

The URL path to this page is https://example.com/, the URL name is home. We want to use the URL name home in our markdown links, like this:

This should render to:

We also want to support query params and hash:

This should render to the following HTML:

Using URL names, if we change the URL path, the links in the content will not be broken. To check if the href provided by the writer is a valid url_name, we can try to reverse it:

The URL name 'home' points to the url path '/'. When there is no match, an exception is raised:

Before we move forward, what happens when the URL name include query params or a hash:

This makes sense because query parameters and hash are not part of the URL name.

To use reverseand support query params and hashes, we first need to clean the value. Then, check that it is a valid URL name and return the URL path including the query params and hash, if provided:

This snippet uses a regular expression to split href in the occurrence of either ? or #, and return the parts.

Make sure that it works:

Amazing! Writers can now use URL names in Markdown. They can also include query parameters and fragment to be added to the URL.

Handling External Links

To handle external links properly we want to check two things:

External links always provide a scheme, either http: or https:.
Prevent absolute links to our own site. Internal links should use URL names.

So far, we handled URL names and mailto links. If we passed these two checks it means href is a URL. Let's start by checking if the link is to our own site:

The function urlparse returns a named tuple that contains the different parts of the URL. If the netloc property equals the site_domain, the link is really an internal link.

If the URL is in fact internal, we need to fail. But, keep in mind that writers are not necessarily technical people, so we want to help them out a bit and provide a useful error message. We require that internal links use a URL name and not a URL path, so it's best to let writers know what is the URL name for the path they provided.

To get the URL name of a URL path, Django provides a function called resolve:

When a match is found, resolve returns a ResolverMatch object that contains, among other information, the URL name. When a match is not found, it raises an error:

This is actually what Django does under the hood to determine which view function to execute when a new request comes in.

To provide writers with better error messages we can use the URL name from the ResolverMatch object:

When we identify that the link in internal, we handle two cases:

We don't recognize the URL: The url is most likely incorrect. Ask the writer to check the URL for mistakes.
We recognize the URL: The url is correct so tell the writer what URL name to use instead.

Let's see it in action:

Nice! External links are accepted and internal links are rejected with a helpful message.

Requiring Scheme

The last thing we want to do is to make sure external links include a scheme, either http: or https:. Let's add that last piece to the function clean_link:

Using the parsed URL we can easily check the scheme. Let's make sure it's working:

We provided the function with a link that has no scheme, and it failed with a helpful message. Cool!

Putting it All Together

Python Import Markdown File

This is the complete code for the clean_link function:

To get a sense of what a real use case for all of these features look like, take a look at the following content:

This will produce the following HTML:

Nice!

Conclusion

We now have a pretty sweet extension that can validate and transform links in Markdown documents! It is now much easier to move documents between environments and keep our content tidy and most importantly, correct and up to date!

Source

The full source code can be found in this gist.

Taking it Further

The capabilities described in this article worked well for us, but you might want to adjust it to fit your own needs.

If you need some ideas, then in addition to this extension we also created a markdown Preprocessor that lets writers use constants in Markdown. For example, we defined a constant called SUPPORT_EMAIL, and we use it like this:

The preprocessor will replace the string $SUPPORT_EMAIL with the text we defined, and only then render the Markdown.

-->

In this tutorial, you learn how to convert Juptyer notebooks into Python scripts to make it testing and automation friendly using the MLOpsPython code template and Azure Machine Learning. Typically, this process is used to take experimentation / training code from a Juptyer notebook and convert it into Python scripts. Those scripts can then be used testing and CI/CD automation in your production environment.

A machine learning project requires experimentation where hypotheses are tested with agile tools like Jupyter Notebook using real datasets. Once the model is ready for production, the model code should be placed in a production code repository. In some cases, the model code must be converted to Python scripts to be placed in the production code repository. This tutorial covers a recommended approach on how to export experimentation code to Python scripts.

In this tutorial, you learn how to:

Clean nonessential code
Refactor Jupyter Notebook code into functions
Create Python scripts for related tasks
Create unit tests

Prerequisites

Generate the MLOpsPython templateand use the experimentation/Diabetes Ridge Regression Training.ipynb and experimentation/Diabetes Ridge Regression Scoring.ipynb notebooks. These notebooks are used as an example of converting from experimentation to production. You can find these notebooks at https://github.com/microsoft/MLOpsPython/tree/master/experimentation.
Install nbconvert. Follow only the installation instructions under section Installing nbconvert on the Installation page.

Remove all nonessential code

Some code written during experimentation is only intended for exploratory purposes. Therefore, the first step to convert experimental code into production code is to remove this nonessential code. Removing nonessential code will also make the code more maintainable. In this section, you'll remove code from the experimentation/Diabetes Ridge Regression Training.ipynb notebook. The statements printing the shape of X and y and the cell calling features.describe are just for data exploration and can be removed. After removing nonessential code, experimentation/Diabetes Ridge Regression Training.ipynb should look like the following code without markdown:

Refactor code into functions

Second, the Jupyter code needs to be refactored into functions. Refactoring code into functions makes unit testing easier and makes the code more maintainable. In this section, you'll refactor:

The Diabetes Ridge Regression Training notebook(experimentation/Diabetes Ridge Regression Training.ipynb)
The Diabetes Ridge Regression Scoring notebook(experimentation/Diabetes Ridge Regression Scoring.ipynb)

Refactor Diabetes Ridge Regression Training notebook into functions

In experimentation/Diabetes Ridge Regression Training.ipynb, complete the following steps:

Create a function called split_data to split the data frame into test and train data. The function should take the dataframe df as a parameter, and return a dictionary containing the keys train and test.
Move the code under the Split Data into Training and Validation Sets heading into the split_data function and modify it to return the data object.
Create a function called train_model, which takes the parameters data and args and returns a trained model.
Move the code under the heading Training Model on Training Set into the train_model function and modify it to return the reg_model object. Remove the args dictionary, the values will come from the args parameter.
Create a function called get_model_metrics, which takes parameters reg_model and data, and evaluates the model then returns a dictionary of metrics for the trained model.
Move the code under the Validate Model on Validation Set heading into the get_model_metrics function and modify it to return the metrics object.

The three functions should be as follows:

Still in experimentation/Diabetes Ridge Regression Training.ipynb, complete the following steps:

Create a new function called main, which takes no parameters and returns nothing.
Move the code under the 'Load Data' heading into the main function.
Add invocations for the newly written functions into the main function:
Move the code under the 'Save Model' heading into the main function.

The main function should look like the following code:

At this stage, there should be no code remaining in the notebook that isn't in a function, other than import statements in the first cell.

Add a statement that calls the main function.

After refactoring, experimentation/Diabetes Ridge Regression Training.ipynb should look like the following code without the markdown:

Refactor Diabetes Ridge Regression Scoring notebook into functions

In experimentation/Diabetes Ridge Regression Scoring.ipynb, complete the following steps:

Create a new function called init, which takes no parameters and return nothing.
Copy the code under the 'Load Model' heading into the init function.

The init function should look like the following code:

Once the init function has been created, replace all the code under the heading 'Load Model' with a single call to init as follows:

In experimentation/Diabetes Ridge Regression Scoring.ipynb, complete the following steps:

Create a new function called run, which takes raw_data and request_headers as parameters and returns a dictionary of results as follows:
Copy the code under the 'Prepare Data' and 'Score Data' headings into the run function.
The run function should look like the following code (Remember to remove the statements that set the variables raw_data and request_headers, which will be used later when the run function is called):

Once the run function has been created, replace all the code under the 'Prepare Data' and 'Score Data' headings with the following code:

The previous code sets variables raw_data and request_header, calls the run function with raw_data and request_header, and prints the predictions.

After refactoring, experimentation/Diabetes Ridge Regression Scoring.ipynb should look like the following code without the markdown:

Combine related functions in Python files

Third, related functions need to be merged into Python files to better help code reuse. In this section, you'll be creating Python files for the following notebooks:

The Diabetes Ridge Regression Training notebook(experimentation/Diabetes Ridge Regression Training.ipynb)
The Diabetes Ridge Regression Scoring notebook(experimentation/Diabetes Ridge Regression Scoring.ipynb)

Create Python file for the Diabetes Ridge Regression Training notebook

Convert your notebook to an executable script by running the following statement in a command prompt, which uses the nbconvert package and the path of experimentation/Diabetes Ridge Regression Training.ipynb:

Once the notebook has been converted to train.py, remove any unwanted comments. Replace the call to main() at the end of the file with a conditional invocation like the following code:

Your train.py file should look like the following code:

train.py can now be invoked from a terminal by running python train.py.The functions from train.py can also be called from other files.

The train_aml.py file found in the diabetes_regression/training directory in the MLOpsPython repository calls the functions defined in train.py in the context of an Azure Machine Learning experiment run. The functions can also be called in unit tests, covered later in this guide.

Create Python file for the Diabetes Ridge Regression Scoring notebook

Covert your notebook to an executable script by running the following statement in a command prompt that which uses the nbconvert package and the path of experimentation/Diabetes Ridge Regression Scoring.ipynb:

Once the notebook has been converted to score.py, remove any unwanted comments. Your score.py file should look like the following code:

The model variable needs to be global so that it's visible throughout the script. Add the following statement at the beginning of the init function:

After adding the previous statement, the init function should look like the following code:

Create unit tests for each Python file

Fourth, create unit tests for your Python functions. Unit tests protect code against functional regressions and make it easier to maintain. In this section, you'll be creating unit tests for the functions in train.py.

train.py contains multiple functions, but we'll only create a single unit test for the train_model function using the Pytest framework in this tutorial. Pytest isn't the only Python unit testing framework, but it's one of the most commonly used. For more information, visit Pytest.

A unit test usually contains three main actions:

Arrange object - creating and setting up necessary objects
Act on an object
Assert what is expected

The unit test will call train_model with some hard-coded data and arguments, and validate that train_model acted as expected by using the resulting trained model to make a prediction and comparing that prediction to an expected value.

Next steps

Now that you understand how to convert from an experiment to production code, see the following links for more information and next steps: