Tidy paste texts from PDF files

PDF files are not made to be copy-pasted, but a lot of times we need to copy-paste text from a PDF file to another text editor. Then we get text pasted in this fashion:
The wearable device of claim 4 , further comprising a
cover mounted over an open end of the cavity to enclose the
wearable antenna between the wearable body and the cover , wherein the cover is made of an electrically non – conductive
material and is permeable to radio waves in at least an operation frequency range of the wearable antenna

Sentences are broken into seperate lines, and it also require tedious editting to make it look nicer again in a text editor such as office word.

Today I made a small program called “tidy paste” that can solve this problem.

User can run this program at the background. When cleaning of the text is needed, he can do the normal ctrl+c first to copy, then press the hotkey ctrl+alt+t so that the program will clean up the text in the clipboard. When the user press ctrl+v, he will get the text pasted like this:

The wearable device of claim 4 , further comprising a cover mounted over an open end of the cavity to enclose the wearable antenna between the wearable body and the cover , wherein the cover is made of an electrically non – conductive material and is permeable to radio waves in at least an operation frequency range of the wearable antenna
So this program combine the seperated lines back again to follow paragraphs, largely reduce the need to remove the extra line breaks. But the bad thing is you got to press one more shortcut in between ctrl+c and ctrl+d. This program can also be used to convert copied text to lowercase, remove the extra spaces in the text, and remove the empty lines in the text. All of them are easily selectable on the program window.

Here is a small demo of it working in real life:

Can see from the video, the libreoffice writer has already implemented some algorithm to remove the breaklines, but it can’t recognize where is the intended breaklines that should stay. With the Tidy Paste program, it remove the breaklines that break a full sentence and leave other breaklines intact.
However there is one bug when I am pressing the ctrl+alt+t shortcut in the google remote window. It seems to trigger the input of the sign “TM” in the office word. So maybe I need to implement another shortcut that don’t bother any other text editing software. Anyway, if anyone is interested in such a program, let me know. I can share the executable in Github.

Leave a comment

Your email address will not be published. Required fields are marked *