encoding - how to read text copied from web to txt file using python -


i'm learning how read text files. used way:

f=open("sample.txt")  print(f.read()) 

it worked fine if typed txt file myself. when copied text news article on web, produced following error:

unicodeencodeerror: 'charmap' codec can't encode charater '\u2014' in position 738: character maps undefined 

i tried changing encoding setting in notepad++ utf-8 read somewhere due that

i tried using:

f=open("sample.txt",encoding='utf-8') 

from here

but still didn't work.

you're on windows , trying print console. print() throwing exception.

the windows console natively supports 8bit code pages, outside of region break (despite people chcp 65001).

you need install , use https://github.com/drekin/win-unicode-console. module talks @ low-level console api, giving support multi-byte characters, input , output.

alternatively, don't print console , write output file, opened encoding. example:

with open("myoutput.log", "w", encoding="utf-8") my_log:     my_log.write(body) 

ensure open file correct encoding.


Comments