Python Hack for RTF text
I deal with a read-only MS SQL Server data base day in and out to create reports. As with many of the legacy systems where we see unwanted hiccups, I came across a data column which looked normal for the UI. However, downloading the data to a spreadsheet had a lot of unwanted text (rtf tags). This awesome function called rtf_to_text helped me solve the issue and get me the plain text from the column.
# orig_text is a variable with string that has rtf tags
orig_text = '''{\rtf1\deff0
{\colortbl;\red0\green0\blue0;\red255\green0\blue0;}
This line is the default color\line
\cf2
This line is red\line
\cf1
This line is the default color
}'''# Install striprtf if it doesn't exist already
pip install striprtf# Import the module needed
from striprtf.striprtf import rtf_to_text
mod_text = rtf_to_text(orig_text)
print(s2)
However, based on my experience, importing the rtf text as a file to python and applying the function above seemed to work great rather than copy-pasting the rtf text from source into python as the unicode characters might change in this copy-paste process.
Hope this is helpful to someone.