Google Co-op and windows-1251

Google Co-op is a service that allows to create custom search engines. At the moment, it officially supports only English, but it seems other languages work as well.

To workaround the ASCII limit for a Custom Search Engine (CSE) name and description, one can use the XML interface. Download the XML description, add the utf8-encoded text to XML and upload back the XML.

Another problem is search directly from a site. My site works in the cp1251 encoding, therefore browsers submit search requests in this encoding, but Goggle Co-op doesn't understand it.

Fortunately, I found a workaround. Instead of submitting the query directly to Google, the search request is submitted to a PHP script. The scripts converts the query text from cp1251 to utf8 and redirects the browser to the CSE:

<?php
$pre = 'http://google.com/cse?cx=016263988511596419578:cpl03_53a24&sa=Search&cof=FORID:1&q=';
$q = @iconv('cp1251', 'UTF-8', $_GET['q']);
if (FALSE === $q) {
  $q = $_GET['q'];
}
$url = $pre . urlencode($q);
$redir = 'Location: ' . $url;
header($redir);
exit;
?>

5 November 2006, update

My addition to the discussion "Placing CSE Code to page with Windows-1251 encoding":

I suggest to play with the attributes "accept" and "accept-charset" of the tag "form". I have no idea if the browsers support them, but HTML 4 specification says:

accept-charset = charset list [CI]

This attribute specifies the list of character encodings for input data that is accepted by the server processing this form. The value is a space- and/or comma-delimited list of charset values. The client must interpret this list as an exclusive-or list, i.e., the server is able to accept any single character encoding per entity received.

The default value for this attribute is the reserved string "UNKNOWN". User agents may interpret this value as the character encoding that was used to transmit the document containing this FORM element.

accept = content-type-list [CI]

This attribute specifies a comma-separated list of content types that a server processing this form will handle correctly. User agents may use this information to filter out non-conforming files when prompting a user to select files to be sent to the server (cf. the INPUT element when type="file").

7 Novermber 2006, update

yet another suggestion:


<input name="lr" type="hidden" value="lang_ru">
Categories:

Updated: