Python Hack for RTF text

Ravi Nalla
1 min readJan 22, 2022

--

I deal with a read-only MS SQL Server data base day in and out to create reports. As with many of the legacy systems where we see unwanted hiccups, I came across a data column which looked normal for the UI. However, downloading the data to a spreadsheet had a lot of unwanted text (rtf tags). This awesome function called rtf_to_text helped me solve the issue and get me the plain text from the column.

# orig_text is a variable with string that has rtf tags
orig_text = '''{\rtf1\deff0
{\colortbl;\red0\green0\blue0;\red255\green0\blue0;}
This line is the default color\line
\cf2
This line is red\line
\cf1
This line is the default color
}'''
# Install striprtf if it doesn't exist already
pip install striprtf
# Import the module needed
from striprtf.striprtf import rtf_to_text
mod_text = rtf_to_text(orig_text)
print(s2)

However, based on my experience, importing the rtf text as a file to python and applying the function above seemed to work great rather than copy-pasting the rtf text from source into python as the unicode characters might change in this copy-paste process.

Hope this is helpful to someone.

--

--

Ravi Nalla
Ravi Nalla

Written by Ravi Nalla

A Data guy, hustling to be a full-time Data Engineer. Fun Fact: Majored in Pharmacy, Chemistry, Information Systems. www.linkedin.com/in/ravi-nalla

No responses yet