<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.2.3" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>
<channel>
	<title>Comments on: python, re-encoding incorrected encoded string</title>
	<link>http://uucode.com/blog/2007/08/17/python-re-encoding-incorrected-encoded-string/</link>
	<description>advocating olpa's open source developments</description>
	<pubDate>Fri, 29 Aug 2008 19:57:50 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.2.3</generator>

	<item>
		<title>By: olpa</title>
		<link>http://uucode.com/blog/2007/08/17/python-re-encoding-incorrected-encoded-string/#comment-8698</link>
		<dc:creator>olpa</dc:creator>
		<pubDate>Fri, 17 Aug 2007 11:34:42 +0000</pubDate>
		<guid>http://uucode.com/blog/2007/08/17/python-re-encoding-incorrected-encoded-string/#comment-8698</guid>
		<description>Can't believe! Actually, the problematic symbol was the Russian small letter "r'. Old-time programmers can remember a lot of problems with this letter in 1990th years.

Thanks Adobe for reminding it!</description>
		<content:encoded><![CDATA[<p>Can&#8217;t believe! Actually, the problematic symbol was the Russian small letter &#8220;r&#8217;. Old-time programmers can remember a lot of problems with this letter in 1990th years.</p>
<p>Thanks Adobe for reminding it!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: olpa</title>
		<link>http://uucode.com/blog/2007/08/17/python-re-encoding-incorrected-encoded-string/#comment-8697</link>
		<dc:creator>olpa</dc:creator>
		<pubDate>Fri, 17 Aug 2007 11:31:18 +0000</pubDate>
		<guid>http://uucode.com/blog/2007/08/17/python-re-encoding-incorrected-encoded-string/#comment-8697</guid>
		<description>The very final solution is to use the first variant, but wrap chr(ord(ch)) to a try-except block, and replace bad characters with "?". It's ok for me for now,</description>
		<content:encoded><![CDATA[<p>The very final solution is to use the first variant, but wrap chr(ord(ch)) to a try-except block, and replace bad characters with &#8220;?&#8221;. It&#8217;s ok for me for now,</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: olpa</title>
		<link>http://uucode.com/blog/2007/08/17/python-re-encoding-incorrected-encoded-string/#comment-8696</link>
		<dc:creator>olpa</dc:creator>
		<pubDate>Fri, 17 Aug 2007 11:24:39 +0000</pubDate>
		<guid>http://uucode.com/blog/2007/08/17/python-re-encoding-incorrected-encoded-string/#comment-8696</guid>
		<description>Don't worry, the code is correct. In my case, my source data is actually in MAC ROMAN encoding, and it contains a character (less or equal), which is not in latin-1 encoding.

Probably it's also the reason why I failed with a simple solution: I can't convert non-latin-1 data to latin-1.</description>
		<content:encoded><![CDATA[<p>Don&#8217;t worry, the code is correct. In my case, my source data is actually in MAC ROMAN encoding, and it contains a character (less or equal), which is not in latin-1 encoding.</p>
<p>Probably it&#8217;s also the reason why I failed with a simple solution: I can&#8217;t convert non-latin-1 data to latin-1.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: olpa</title>
		<link>http://uucode.com/blog/2007/08/17/python-re-encoding-incorrected-encoded-string/#comment-8695</link>
		<dc:creator>olpa</dc:creator>
		<pubDate>Fri, 17 Aug 2007 11:16:37 +0000</pubDate>
		<guid>http://uucode.com/blog/2007/08/17/python-re-encoding-incorrected-encoded-string/#comment-8695</guid>
		<description>Oh, god. The code is just invalid. For my goals, it works:

&lt;pre&gt;
import codecs
w1251dec = codecs.getdecoder('windows-1251')

def reenc(s):
  s2 = ''
  for ch in s:
    try:
      ch2 = w1251dec(ch)[0]
      s2 = s2 + ch2
    except UnicodeEncodeError:
      s2 = s2 + ch
  return s2

s = unicode("\\xc3\\xeb\\xe0\\xe2\\xe0", 'latin-1')
s = u'\\u2264' + s
print s, repr(s)
s = reenc(s)
print s, repr(s)
&lt;/pre&gt;

Still TODO: how to convert an unicode string to an usual string in a given encoding.</description>
		<content:encoded><![CDATA[<p>Oh, god. The code is just invalid. For my goals, it works:</p>
<pre>
import codecs
w1251dec = codecs.getdecoder('windows-1251')

def reenc(s):
  s2 = ''
  for ch in s:
    try:
      ch2 = w1251dec(ch)[0]
      s2 = s2 + ch2
    except UnicodeEncodeError:
      s2 = s2 + ch
  return s2

s = unicode("\xc3\xeb\xe0\xe2\xe0", 'latin-1')
s = u'\u2264' + s
print s, repr(s)
s = reenc(s)
print s, repr(s)
</pre>
<p>Still TODO: how to convert an unicode string to an usual string in a given encoding.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
