pyquery

changeset 80:91a4330801b9 0.3.1

merge
author Gael Pasgrimaud <gael@gawel.org>
date Sat Jan 24 03:08:56 2009 +0100 (18 months ago)
parents 58b15bae680f 45ac7d97a0ae
children e756c934656d
files pyquery/README.txt pyquery/pyquery.py pyquery/test.py
line diff
     1.1 --- a/.hgtags	Sat Jan 24 03:00:22 2009 +0100
     1.2 +++ b/.hgtags	Sat Jan 24 03:08:56 2009 +0100
     1.3 @@ -1,1 +1,2 @@
     1.4  87f002ce396754a04a55b4dc8494f38100957108 0.2
     1.5 +9796ea9cb849ce66ca29b394b55a265fe2acb332 0.3
     2.1 --- a/pyquery/README.txt	Sat Jan 24 03:00:22 2009 +0100
     2.2 +++ b/pyquery/README.txt	Sat Jan 24 03:08:56 2009 +0100
     2.3 @@ -11,6 +11,18 @@
     2.4  
     2.5  It can be used for many purposes, one idea that I might try in the future is to
     2.6  use it for templating with pure http templates that you modify using pyquery.
     2.7 +I can also be used for web scrapping or for theming applications with
     2.8 +`Deliverance`_.
     2.9 +
    2.10 +The `project`_ is being actively developped on a mercurial repository on
    2.11 +Bitbucket. I have the policy of giving push access to anyone who wants it
    2.12 +and then to review what he does. So if you want to contribute just email me.
    2.13 +
    2.14 +The Sphinx documentation is available on `pyquery.org`_.
    2.15 +
    2.16 +.. _deliverance: http://www.gawel.org/weblog/en/2008/12/skinning-with-pyquery-and-deliverance
    2.17 +.. _project: http://www.bitbucket.org/olauzanne/pyquery/
    2.18 +.. _pyquery.org: http://pyquery.org/
    2.19  
    2.20  .. contents::
    2.21  
    2.22 @@ -42,7 +54,8 @@
    2.23      'you know Python rocks'
    2.24  
    2.25  You can use some of the pseudo classes that are available in jQuery but that
    2.26 -are not standard in css such as :first :last :even :odd :eq :lt :gt::
    2.27 +are not standard in css such as :first :last :even :odd :eq :lt :gt :checked
    2.28 +:selected :file::
    2.29  
    2.30      >>> d('p:first')
    2.31      [<p#hello.hello>]
    2.32 @@ -117,63 +130,18 @@
    2.33  Traversing
    2.34  ----------
    2.35  
    2.36 -Some jQuery traversal methods are supported.  For instance, you can filter the selection list
    2.37 -using a string selector::
    2.38 +Some jQuery traversal methods are supported.  Here are a few examples.
    2.39 +
    2.40 +You can filter the selection list using a string selector::
    2.41  
    2.42      >>> d('p').filter('.hello')
    2.43      [<p#hello.hello>]
    2.44  
    2.45 -Filtering can also be done using a function::
    2.46 -
    2.47 -    >>> d('p').filter(lambda i: i == 1)
    2.48 -    [<p#test>]
    2.49 -
    2.50 -Filtering functions can refer to the current element as 'this', like in jQuery::
    2.51 -
    2.52 -    >>> d('p').filter(lambda i: pq(this).text() == 'you know Python rocks')
    2.53 -    [<p#hello.hello>]
    2.54 -
    2.55 -The opposite of filter is `not_` - it returns the items that don't match the selector::
    2.56 -
    2.57 -    >>> d('p').not_('.hello')
    2.58 -    [<p#test>]
    2.59 -
    2.60 -You can map a callable onto a PyQuery and get a mutated result. The result can
    2.61 -contain any items, not just elements::
    2.62 -
    2.63 -    >>> d('p').map(lambda i, e: pq(e).text())
    2.64 -    ['you know Python rocks', 'hello python !']
    2.65 -
    2.66 -Like the filter method, map callbacks can reference the current item as this::
    2.67 -
    2.68 -    >>> d('p').map(lambda i, e: len(pq(this).text()))
    2.69 -    [21, 14]
    2.70 -
    2.71 -The map callback can also return a list, which will extend the resulting
    2.72 -PyQuery::
    2.73 -
    2.74 -    >>> d('p').map(lambda i, e: pq(this).text().split())
    2.75 -    ['you', 'know', 'Python', 'rocks', 'hello', 'python', '!']
    2.76 -
    2.77  It is possible to select a single element with eq::
    2.78  
    2.79      >>> d('p').eq(0)
    2.80      [<p#hello.hello>]
    2.81  
    2.82 -The `is_` method lets you query if any current elements match the selector::
    2.83 -
    2.84 -    >>> d('p').eq(0).is_('.hello')
    2.85 -    True
    2.86 -    >>> d('p').eq(1).is_('.hello')
    2.87 -    False
    2.88 -
    2.89 -hasClass allows for checking for the presence of a class by name::
    2.90 -
    2.91 -    >>> d('p').eq(0).hasClass('hello')
    2.92 -    True
    2.93 -    >>> d('p').eq(1).hasClass('hello')
    2.94 -    False
    2.95 -
    2.96  You can find nested elements::
    2.97  
    2.98      >>> d('p').find('a')
    2.99 @@ -331,17 +299,34 @@
   2.100  Making links absolute
   2.101  ---------------------
   2.102  
   2.103 -You can make all links on a page absolute which can be usefull for screen
   2.104 -scrapping::
   2.105 +You can make links absolute which can be usefull for screen scrapping::
   2.106  
   2.107 -    >>> d = pq(url='http://google.com')
   2.108 -    >>> d('a:last').attr('href')
   2.109 -    '/intl/fr/privacy.html'
   2.110 +    >>> d = pq(url='http://www.w3.org/', parser='html')
   2.111 +    >>> d('a[title="W3C Activities"]').attr('href')
   2.112 +    '/Consortium/activities'
   2.113      >>> d.make_links_absolute()
   2.114      [<html>]
   2.115 -    >>> d('a:last').attr('href')
   2.116 -    'http://google.com/intl/fr/privacy.html'
   2.117 +    >>> d('a[title="W3C Activities"]').attr('href')
   2.118 +    'http://www.w3.org/Consortium/activities'
   2.119  
   2.120 +Using different parsers
   2.121 +-----------------------
   2.122 +
   2.123 +By default pyquery uses the lxml xml parser and then if it doesn't work goes on
   2.124 +to try the html parser from lxml.html. The xml parser can sometimes be
   2.125 +problematic when parsing xhtml pages because the parser will not raise an error
   2.126 +but give an unusable tree (on w3c.org for example).
   2.127 +
   2.128 +You can also choose which parser to use explicitly::
   2.129 +
   2.130 +   >>> pq('<html><body><p>toto</p></body></html>', parser='xml')
   2.131 +   [<html>]
   2.132 +   >>> pq('<html><body><p>toto</p></body></html>', parser='html')
   2.133 +   [<html>]
   2.134 +   >>> pq('<html><body><p>toto</p></body></html>', parser='html_fragments')
   2.135 +   [<p>]
   2.136 +
   2.137 +The html and html_fragments parser are the ones from lxml.html.
   2.138  
   2.139  Testing
   2.140  -------
   2.141 @@ -363,24 +348,28 @@
   2.142  
   2.143      $ STATIC_DEPS=true bin/buildout
   2.144  
   2.145 -Other documentations
   2.146 ---------------------
   2.147 +More documentation
   2.148 +------------------
   2.149  
   2.150 -For more documentation about the API use the jquery website http://docs.jquery.com/
   2.151 +First there is the Sphinx documentation `here`_.
   2.152 +Then for more documentation about the API you can use the `jquery website`_.
   2.153 +The reference I'm now using for the API is ... the `color cheat sheet`_.
   2.154 +Then you can always look at the `code`_.
   2.155  
   2.156 -The reference I'm now using for the API is ... the color cheat sheet
   2.157 -http://colorcharge.com/wp-content/uploads/2007/12/jquery12_colorcharge.png
   2.158 +.. _jquery website: http://docs.jquery.com/
   2.159 +.. _code: http://www.bitbucket.org/olauzanne/pyquery/src/tip/pyquery/pyquery.py
   2.160 +.. _here: http://pyquery.org
   2.161 +.. _color cheat sheet: http://colorcharge.com/wp-content/uploads/2007/12/jquery12_colorcharge.png
   2.162  
   2.163  TODO
   2.164  ----
   2.165  
   2.166 -- SELECTORS: it works fine but missing all the :xxx (:first, :last, ...) can be
   2.167 -  done by patching lxml.cssselect
   2.168 +- SELECTORS: still missing some jQuery pseudo classes (:radio, :password, ...)
   2.169  - ATTRIBUTES: done
   2.170  - CSS: done
   2.171  - HTML: done
   2.172 -- MANIPULATING: did all but the "wrap" methods
   2.173 -- TRAVERSING: did a few
   2.174 +- MANIPULATING: missing the wrapAll and wrapInner methods
   2.175 +- TRAVERSING: about half done
   2.176  - EVENTS: nothing to do with server side might be used later for automatic ajax
   2.177  - CORE UI EFFECTS: did hide and show the rest doesn't really makes sense on
   2.178    server side
     3.1 --- a/pyquery/cssselectpatch.py	Sat Jan 24 03:00:22 2009 +0100
     3.2 +++ b/pyquery/cssselectpatch.py	Sat Jan 24 03:08:56 2009 +0100
     3.3 @@ -36,6 +36,36 @@
     3.4          xpath.add_post_condition('position() mod 2 = 0')
     3.5          return xpath
     3.6  
     3.7 +    def _xpath_checked(self, xpath):
     3.8 +        """Matches odd elements, zero-indexed.
     3.9 +        """
    3.10 +        xpath.add_condition("@checked and name(.) = 'input'")
    3.11 +        return xpath
    3.12 +
    3.13 +    def _xpath_selected(self, xpath):
    3.14 +        """Matches all elements that are selected.
    3.15 +        """
    3.16 +        xpath.add_condition("@selected and name(.) = 'option'")
    3.17 +        return xpath
    3.18 +
    3.19 +    def _xpath_disabled(self, xpath):
    3.20 +        """Matches all elements that are disabled.
    3.21 +        """
    3.22 +        xpath.add_condition("@disabled")
    3.23 +        return xpath
    3.24 +
    3.25 +    def _xpath_enabled(self, xpath):
    3.26 +        """Matches all elements that are disabled.
    3.27 +        """
    3.28 +        xpath.add_condition("not(@disabled) and name(.) = 'input'")
    3.29 +        return xpath
    3.30 +
    3.31 +    def _xpath_file(self, xpath):
    3.32 +        """Matches all input elements of type file.
    3.33 +        """
    3.34 +        xpath.add_condition("@type = 'file' and name(.) = 'input'")
    3.35 +        return xpath
    3.36 +
    3.37  cssselect.Pseudo = JQueryPseudo
    3.38  
    3.39  class JQueryFunction(Function):
     4.1 --- a/pyquery/pyquery.py	Sat Jan 24 03:00:22 2009 +0100
     4.2 +++ b/pyquery/pyquery.py	Sat Jan 24 03:08:56 2009 +0100
     4.3 @@ -5,16 +5,26 @@
     4.4  # Distributed under the BSD license, see LICENSE.txt
     4.5  from cssselectpatch import selector_to_xpath
     4.6  from lxml import etree
     4.7 +import lxml.html
     4.8  from copy import deepcopy
     4.9  from urlparse import urljoin
    4.10  
    4.11 -def fromstring(context):
    4.12 +def fromstring(context, parser=None):
    4.13      """use html parser if we don't have clean xml
    4.14      """
    4.15 -    try:
    4.16 -        return etree.fromstring(context)
    4.17 -    except etree.XMLSyntaxError:
    4.18 -        return etree.fromstring(context, etree.HTMLParser())
    4.19 +    if parser == None:
    4.20 +        try:
    4.21 +            return [etree.fromstring(context)]
    4.22 +        except etree.XMLSyntaxError:
    4.23 +            return [lxml.html.fromstring(context)]
    4.24 +    elif parser == 'xml':
    4.25 +        return [etree.fromstring(context)]
    4.26 +    elif parser == 'html':
    4.27 +        return [lxml.html.fromstring(context)]
    4.28 +    elif parser == 'html_fragments':
    4.29 +        return lxml.html.fragments_fromstring(context)
    4.30 +    else:
    4.31 +        ValueError('No such parser: "%s"' % parser)
    4.32  
    4.33  class NoDefault(object):
    4.34      def __repr__(self):
    4.35 @@ -59,6 +69,13 @@
    4.36          html = None
    4.37          elements = []
    4.38          self._base_url = None
    4.39 +        parser = kwargs.get('parser')
    4.40 +        if 'parser' in kwargs:
    4.41 +            del kwargs['parser']
    4.42 +        if not kwargs and len(args) == 1 and isinstance(args[0], basestring) \
    4.43 +           and args[0].startswith('http://'):
    4.44 +            kwargs = {'url': args[0]}
    4.45 +            args = []
    4.46  
    4.47          if 'parent' in kwargs:
    4.48              self._parent = kwargs.pop('parent')
    4.49 @@ -76,7 +93,7 @@
    4.50                  self._base_url = url
    4.51              else:
    4.52                  raise ValueError('Invalid keyword arguments %s' % kwargs)
    4.53 -            elements = [fromstring(html)]
    4.54 +            elements = fromstring(html, parser)
    4.55          else:
    4.56              # get nodes
    4.57  
    4.58 @@ -94,7 +111,7 @@
    4.59              # get context
    4.60              if isinstance(context, basestring):
    4.61                  try:
    4.62 -                    elements = [fromstring(context)]
    4.63 +                    elements = fromstring(context, parser)
    4.64                  except Exception, e:
    4.65                      raise ValueError('%r, %s' % (e, context))
    4.66              elif isinstance(context, self.__class__):
    4.67 @@ -164,7 +181,18 @@
    4.68      ##############
    4.69  
    4.70      def filter(self, selector):
    4.71 -        """Filter elements in self using selector (string or function)."""
    4.72 +        """Filter elements in self using selector (string or function).
    4.73 +
    4.74 +            >>> d = PyQuery('<p class="hello">Hi</p><p>Bye</p>')
    4.75 +            >>> d('p')
    4.76 +            [<p.hello>, <p>]
    4.77 +            >>> d('p').filter('.hello')
    4.78 +            [<p.hello>]
    4.79 +            >>> d('p').filter(lambda i: i == 1)
    4.80 +            [<p>]
    4.81 +            >>> d('p').filter(lambda i: PyQuery(this).text() == 'Hi')
    4.82 +            [<p.hello>]
    4.83 +        """
    4.84          if not callable(selector):
    4.85              return self.__class__(selector, self, **dict(parent=self))
    4.86          else:
    4.87 @@ -179,16 +207,35 @@
    4.88              return self.__class__(elements, **dict(parent=self))
    4.89  
    4.90      def not_(self, selector):
    4.91 -        """Return elements that don't match the given selector."""
    4.92 +        """Return elements that don't match the given selector.
    4.93 +
    4.94 +            >>> d = PyQuery('<p class="hello">Hi</p><p>Bye</p><div></div>')
    4.95 +            >>> d('p').not_('.hello')
    4.96 +            [<p>]
    4.97 +        """
    4.98          exclude = set(self.__class__(selector, self))
    4.99          return self.__class__([e for e in self if e not in exclude], **dict(parent=self))
   4.100  
   4.101      def is_(self, selector):
   4.102 -        """Returns True if selector matches at least one current element, else False."""
   4.103 +        """Returns True if selector matches at least one current element, else False.
   4.104 +            >>> d = PyQuery('<p class="hello">Hi</p><p>Bye</p><div></div>')
   4.105 +            >>> d('p').eq(0).is_('.hello')
   4.106 +            True
   4.107 +            >>> d('p').eq(1).is_('.hello')
   4.108 +            False
   4.109 +        """
   4.110          return bool(self.__class__(selector, self))
   4.111  
   4.112      def find(self, selector):
   4.113 -        """Find elements using selector traversing down from self."""
   4.114 +        """Find elements using selector traversing down from self.
   4.115 +
   4.116 +            >>> m = '<p><span><em>Whoah!</em></span></p><p><em> there</em></p>'
   4.117 +            >>> d = PyQuery(m)
   4.118 +            >>> d('p').find('em')
   4.119 +            [<em>, <em>]
   4.120 +            >>> d('p').eq(1).find('em')
   4.121 +            [<em>]
   4.122 +        """
   4.123          xpath = selector_to_xpath(selector)
   4.124          results = [child.xpath(xpath) for tag in self for child in tag.getchildren()]
   4.125          # Flatten the results
   4.126 @@ -198,7 +245,14 @@
   4.127          return self.__class__(elements, **dict(parent=self))
   4.128  
   4.129      def eq(self, index):
   4.130 -        """Return PyQuery of only the element with the provided index."""
   4.131 +        """Return PyQuery of only the element with the provided index.
   4.132 +
   4.133 +            >>> d = PyQuery('<p class="hello">Hi</p><p>Bye</p><div></div>')
   4.134 +            >>> d('p').eq(0)
   4.135 +            [<p.hello>]
   4.136 +            >>> d('p').eq(1)
   4.137 +            [<p>]
   4.138 +        """
   4.139          return self.__class__([self[index]], **dict(parent=self))
   4.140  
   4.141      def each(self, func):
   4.142 @@ -213,6 +267,16 @@
   4.143  
   4.144          func should take two arguments - 'index' and 'element'.  Elements can
   4.145          also be referred to as 'this' inside of func.
   4.146 +
   4.147 +            >>> d = PyQuery('<p class="hello">Hi there</p><p>Bye</p><br />')
   4.148 +            >>> d('p').map(lambda i, e: PyQuery(e).text())
   4.149 +            ['Hi there', 'Bye']
   4.150 +
   4.151 +            >>> d('p').map(lambda i, e: len(PyQuery(this).text()))
   4.152 +            [8, 3]
   4.153 +
   4.154 +            >>> d('p').map(lambda i, e: PyQuery(this).text().split())
   4.155 +            ['Hi', 'there', 'Bye']
   4.156          """
   4.157          items = []
   4.158          try:
   4.159 @@ -236,6 +300,13 @@
   4.160          return len(self)
   4.161  
   4.162      def end(self):
   4.163 +        """Break out of a level of traversal and return to the parent level.
   4.164 +
   4.165 +            >>> m = '<p><span><em>Whoah!</em></span></p><p><em> there</em></p>'
   4.166 +            >>> d = PyQuery(m)
   4.167 +            >>> d('p').eq(1).find('em').end().end()
   4.168 +            [<p>, <p>]
   4.169 +        """
   4.170          return self._parent
   4.171  
   4.172      ##############
   4.173 @@ -650,7 +721,7 @@
   4.174  
   4.175          """
   4.176          assert isinstance(value, basestring)
   4.177 -        value = fromstring(value)
   4.178 +        value = fromstring(value)[0]
   4.179          nodes = []
   4.180          for tag in self:
   4.181              wrapper = deepcopy(value)
   4.182 @@ -685,7 +756,7 @@
   4.183              return self
   4.184  
   4.185          assert isinstance(value, basestring)
   4.186 -        value = fromstring(value)
   4.187 +        value = fromstring(value)[0]
   4.188          wrapper = deepcopy(value)
   4.189          if not wrapper.getchildren():
   4.190              child = wrapper
     5.1 --- a/pyquery/test.py	Sat Jan 24 03:00:22 2009 +0100
     5.2 +++ b/pyquery/test.py	Sat Jan 24 03:08:56 2009 +0100
     5.3 @@ -79,6 +79,27 @@
     5.4             </html>
     5.5             """
     5.6  
     5.7 +    html4 = """
     5.8 +           <html>
     5.9 +            <body>
    5.10 +              <form action="/">
    5.11 +                <input name="enabled" type="text" value="test"/>
    5.12 +                <input name="disabled" type="text" value="disabled" disabled="disabled"/>
    5.13 +                <input name="file" type="file" />
    5.14 +                <select name="select">
    5.15 +                  <option value="">Choose something</option>
    5.16 +                  <option value="one">One</option>
    5.17 +                  <option value="two" selected="selected">Two</option>
    5.18 +                  <option value="three">Three</option>
    5.19 +                </select>
    5.20 +                <input name="radio" type="radio" value="one"/>
    5.21 +                <input name="radio" type="radio" value="two" checked="checked"/>
    5.22 +                <input name="radio" type="radio" value="three"/>
    5.23 +              </form>
    5.24 +            </body>
    5.25 +           </html>
    5.26 +           """
    5.27 +
    5.28      def test_selector_from_doc(self):
    5.29          doc = etree.fromstring(self.html)
    5.30          assert len(self.klass(doc)) == 1
    5.31 @@ -118,6 +139,14 @@
    5.32          self.assertEqual(e('div:lt(1)').text(), 'node1')
    5.33          self.assertEqual(e('div:eq(2)').text(), 'node3')
    5.34  
    5.35 +        #test on the form
    5.36 +        e = self.klass(self.html4)
    5.37 +        assert len(e(':disabled')) == 1
    5.38 +        assert len(e('input:enabled')) == 5
    5.39 +        assert len(e(':selected')) == 1
    5.40 +        assert len(e(':checked')) == 1
    5.41 +        assert len(e(':file')) == 1
    5.42 +
    5.43  class TestTraversal(unittest.TestCase):
    5.44      klass = pq
    5.45      html = """
     6.1 --- a/setup.py	Sat Jan 24 03:00:22 2009 +0100
     6.2 +++ b/setup.py	Sat Jan 24 03:08:56 2009 +0100
     6.3 @@ -9,7 +9,7 @@
     6.4  
     6.5  long_description = open(os.path.join('pyquery', 'README.txt')).read()
     6.6  
     6.7 -version = '0.2'
     6.8 +version = '0.3'
     6.9  
    6.10  setup(name='pyquery',
    6.11        version=version,

To download these repositories, get Mercurial and then type something like:

hg clone http://techn.ocracy.org/repository-name/

You can also click the "zip" or "gz" links to get an archive of the latest revision without installing anything.

The change logs of the repositories are aggregated at techn.ocracy.org/planet.

We have also some darcs repositories at techn.ocracy.org/darcs.