This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: A problem while using urllib


Steve Holden wrote:
Johnny Lee wrote:
[...]

I've sent the source, thanks for your help.


[...]
Preliminary result, in case this rings bells with people who use urllib2 quite a lot. I modified the error case to report the actual message returned with the exception and I'm seeing things like:


http://www.holdenweb.com/./Python/webframeworks.html
Message: <urlopen error (120, 'Operation already in progress')>
Start process http://www.amazon.com/exec/obidos/ASIN/0596001886/steveholden-20
Error: IOError while parsing http://www.amazon.com/exec/obidos/ASIN/0596001886/steveholden-20
Message: <urlopen error (120, 'Operation already in progress')>
.
.
.


So at least we know now what the error is, and it looks like some sort of resource limit (though why only on Cygwin betas me) ... anyone, before I start some serious debugging?

I realized after this post that WingIDE doesn't run under Cygwin, so I modified the code further to raise an error and give us a proper traceback. I also tested the program under the standard Windows 2.4.1 release, where it didn't fail, so I conclude you have unearthed a Cygwin socket bug. Here's the traceback:

End process http://www.holdenweb.com/contact.html
Start process http://freshmeat.net/releases/192449
Error: IOError while parsing http://freshmeat.net/releases/192449
   Message: <urlopen error (120, 'Operation already in progress')>
Traceback (most recent call last):
  File "Spider_bug.py", line 225, in ?
    spider.run()
  File "Spider_bug.py", line 143, in run
    self.grabUrl(tempUrl)
  File "Spider_bug.py", line 166, in grabUrl
    webPage = urllib2.urlopen(url).read()
  File "/usr/lib/python2.4/urllib2.py", line 130, in urlopen
    return _opener.open(url, data)
  File "/usr/lib/python2.4/urllib2.py", line 358, in open
    response = self._open(req, data)
  File "/usr/lib/python2.4/urllib2.py", line 376, in _open
    '_open', req)
  File "/usr/lib/python2.4/urllib2.py", line 337, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.4/urllib2.py", line 1021, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib/python2.4/urllib2.py", line 996, in do_open
    raise URLError(err)
urllib2.URLError: <urlopen error (120, 'Operation already in progress')>

Looking at that part of the course of urrllib2 we see:

headers["Connection"] = "close"
try:
h.request(req.get_method(), req.get_selector(), req.data, headers)
r = h.getresponse()
except socket.error, err: # XXX what error?
raise URLError(err)


So my conclusion is that there's something in the Cygwin socket module that causes problems not seen under other platforms.

I couldn't find any obviously-related error in the Python bug tracker, and I have copied this message to the Cygwin list in case someone there knows what the problem is.

Before making any kind of bug submission you should really see if you can build a program shorter that the existing 220+ lines to demonstrate the bug, but it does look to me like your program should work (as indeed it does on other platforms).

regards
 Steve
--
Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC                     www.holdenweb.com
PyCon TX 2006                  www.python.org/pycon/

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]