<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.2.3" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>
<channel>
	<title>Comments on: python, re-encoding incorrected encoded string</title>
	<link>http://uucode.com/blog/2007/08/17/python-re-encoding-incorrected-encoded-string/</link>
	<description>advocating olpa's open source developments</description>
	<pubDate>Sun, 01 Aug 2010 00:14:28 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.2.3</generator>

	<item>
		<title>By: Oleg</title>
		<link>http://uucode.com/blog/2007/08/17/python-re-encoding-incorrected-encoded-string/#comment-15617</link>
		<dc:creator>Oleg</dc:creator>
		<pubDate>Tue, 07 Jul 2009 04:32:49 +0000</pubDate>
		<guid>http://uucode.com/blog/2007/08/17/python-re-encoding-incorrected-encoded-string/#comment-15617</guid>
		<description>I've noticed that after all the code changes it looks like:

&lt;pre&gt;
def reenc_next(cin, cout, i, s):
  try:
    ch  = chr(i)
    ch1 = cin(ch)[0]
    ch2 = cout(ch)[0]
    s = s.replace(ch1, ch2)
  except:
    pass
  return s

def reenc_string(from_enc, to_enc, s):
  from_dec = codecs.getdecoder(from_enc)
  to_dec   = codecs.getdecoder(to_enc)
  for i in xrange(256):
    s = reenc_next(from_dec, to_dec, i, s)
  return s
&lt;/pre&gt;</description>
		<content:encoded><![CDATA[<p>I&#8217;ve noticed that after all the code changes it looks like:</p>
<pre>
def reenc_next(cin, cout, i, s):
  try:
    ch  = chr(i)
    ch1 = cin(ch)[0]
    ch2 = cout(ch)[0]
    s = s.replace(ch1, ch2)
  except:
    pass
  return s

def reenc_string(from_enc, to_enc, s):
  from_dec = codecs.getdecoder(from_enc)
  to_dec   = codecs.getdecoder(to_enc)
  for i in xrange(256):
    s = reenc_next(from_dec, to_dec, i, s)
  return s
</pre>
]]></content:encoded>
	</item>
	<item>
		<title>By: olpa</title>
		<link>http://uucode.com/blog/2007/08/17/python-re-encoding-incorrected-encoded-string/#comment-8698</link>
		<dc:creator>olpa</dc:creator>
		<pubDate>Fri, 17 Aug 2007 11:34:42 +0000</pubDate>
		<guid>http://uucode.com/blog/2007/08/17/python-re-encoding-incorrected-encoded-string/#comment-8698</guid>
		<description>Can't believe! Actually, the problematic symbol was the Russian small letter "r'. Old-time programmers can remember a lot of problems with this letter in 1990th years.

Thanks Adobe for reminding it!</description>
		<content:encoded><![CDATA[<p>Can&#8217;t believe! Actually, the problematic symbol was the Russian small letter &#8220;r&#8217;. Old-time programmers can remember a lot of problems with this letter in 1990th years.</p>
<p>Thanks Adobe for reminding it!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: olpa</title>
		<link>http://uucode.com/blog/2007/08/17/python-re-encoding-incorrected-encoded-string/#comment-8697</link>
		<dc:creator>olpa</dc:creator>
		<pubDate>Fri, 17 Aug 2007 11:31:18 +0000</pubDate>
		<guid>http://uucode.com/blog/2007/08/17/python-re-encoding-incorrected-encoded-string/#comment-8697</guid>
		<description>The very final solution is to use the first variant, but wrap chr(ord(ch)) to a try-except block, and replace bad characters with "?". It's ok for me for now,</description>
		<content:encoded><![CDATA[<p>The very final solution is to use the first variant, but wrap chr(ord(ch)) to a try-except block, and replace bad characters with &#8220;?&#8221;. It&#8217;s ok for me for now,</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: olpa</title>
		<link>http://uucode.com/blog/2007/08/17/python-re-encoding-incorrected-encoded-string/#comment-8696</link>
		<dc:creator>olpa</dc:creator>
		<pubDate>Fri, 17 Aug 2007 11:24:39 +0000</pubDate>
		<guid>http://uucode.com/blog/2007/08/17/python-re-encoding-incorrected-encoded-string/#comment-8696</guid>
		<description>Don't worry, the code is correct. In my case, my source data is actually in MAC ROMAN encoding, and it contains a character (less or equal), which is not in latin-1 encoding.

Probably it's also the reason why I failed with a simple solution: I can't convert non-latin-1 data to latin-1.</description>
		<content:encoded><![CDATA[<p>Don&#8217;t worry, the code is correct. In my case, my source data is actually in MAC ROMAN encoding, and it contains a character (less or equal), which is not in latin-1 encoding.</p>
<p>Probably it&#8217;s also the reason why I failed with a simple solution: I can&#8217;t convert non-latin-1 data to latin-1.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: olpa</title>
		<link>http://uucode.com/blog/2007/08/17/python-re-encoding-incorrected-encoded-string/#comment-8695</link>
		<dc:creator>olpa</dc:creator>
		<pubDate>Fri, 17 Aug 2007 11:16:37 +0000</pubDate>
		<guid>http://uucode.com/blog/2007/08/17/python-re-encoding-incorrected-encoded-string/#comment-8695</guid>
		<description>Oh, god. The code is just invalid. For my goals, it works:

&lt;pre&gt;
import codecs
w1251dec = codecs.getdecoder('windows-1251')

def reenc(s):
  s2 = ''
  for ch in s:
    try:
      ch2 = w1251dec(ch)[0]
      s2 = s2 + ch2
    except UnicodeEncodeError:
      s2 = s2 + ch
  return s2

s = unicode("\\xc3\\xeb\\xe0\\xe2\\xe0", 'latin-1')
s = u'\\u2264' + s
print s, repr(s)
s = reenc(s)
print s, repr(s)
&lt;/pre&gt;

Still TODO: how to convert an unicode string to an usual string in a given encoding.</description>
		<content:encoded><![CDATA[<p>Oh, god. The code is just invalid. For my goals, it works:</p>
<pre>
import codecs
w1251dec = codecs.getdecoder('windows-1251')

def reenc(s):
  s2 = ''
  for ch in s:
    try:
      ch2 = w1251dec(ch)[0]
      s2 = s2 + ch2
    except UnicodeEncodeError:
      s2 = s2 + ch
  return s2

s = unicode("\xc3\xeb\xe0\xe2\xe0", 'latin-1')
s = u'\u2264' + s
print s, repr(s)
s = reenc(s)
print s, repr(s)
</pre>
<p>Still TODO: how to convert an unicode string to an usual string in a given encoding.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
