Using Files with non-English or Special Characters

Follow

When working with files that contain Unicode, non-English, or special characters; for example, a CSV file using scientific characters, please ensure the file to be imported is encoded using UTF-8. This will allow Posit programs to read the characters stored in the file correctly. 

Files may be converted to UTF-8 in several ways, feel free to use other programs such as Excel, Sublime, or Notepad++ to accomplish this. However, the below methods should be accessible to most users.

To save a file using UTF-8 encoding in Windows:

  • Open the file using Notepad.
  • Click File > Save As.
  • In the dialog window that appears - select "UTF-8" from the Encoding field, then save the file.
  • Your non-English characters should now be displayed correctly when imported into Posit Workbench, Desktop Pro, or Connect applications.

To convert a file to UTF-8 via the command-line in *nix systems such as OSX and Linux, use the iconv command.  

First, find the encoding of the current file:

file -i originalfile.csv

Next, run the iconv command. Replace original_charset with the current encoding of the file (e.g; ASCII), originalfile.csv with the current name of the file, and newfile.csv with the name you wish to give the UTF-8 file. 

iconv -f original_charset -t utf-8 originalfile.csv > newfile.csv

Then import the new UTF-8 encoded file back into your Posit code. The characters should now display correctly. 

Comments