Question HTTP/HTML libraries for C++

agentgonzo

Grounded since '09
Addon Developer
Joined
Feb 8, 2008
Messages
1,649
Reaction score
4
Points
38
Location
Hampshire, UK
Website
orbiter.quorg.org
Does anyone know of a good set of libraries for performing HTTP GETs from a C++ program and then for parsing the resulting web page. Doesn't need to be too fancy. Just a case of downloading a webpage and then parsing it to be able to scan for content easier.

Thanks.

Must be free to re-distribute/GPL/LGPL etc etc.
 
Maybe this will help:
[ame="http://en.wikipedia.org/wiki/WebKit"]WebKit - Wikipedia, the free encyclopedia[/ame]
 
Webkit seems to be more of a layout engine. I need something more lightweight that will fetch webpages and parse the HTML in a simple fashion.
 
Libcurl. It also has C++ bindings. As for HTML parsing, perhaps you could use any small XML parser like expat? I used C version of this: http://libxmlplusplus.sourceforge.net/

And BTW, funny you should ask, because I want to do a similar project soon.
 
Last edited:
Not being a C++ purist, I use .NET for this (which has the added advantage of easy to use RegEx classes). Or calling wget and piping the output to my program.
 
you could create a .NET app just too to the grinding for you, then call it "commando" (by command line) and have it generate you a file or return you whatever you need...

then you don't have to go .NET for everything, and (best of all) don't have to drudge with InterOp, it's just a shell invokation deal... pretty standard stuff :hmm:


i too sometimes have trouble finding "simple" things like this... it seems most libraries designed for this type of "everyday stuff" are terribly bloated and frameworky... less than fit for convenient usability as part of a larger project...


well, just to add up, have you checked ou the Chromium framework? - it's more a full browser, more than an HTML reader... but might be worth checkin out, i think :rolleyes:
 
you could create a .NET app just too to the grinding for you, then call it "commando" (by command line) and have it generate you a file or return you whatever you need...
Not being a C++ purist, I use .NET for this (which has the added advantage of easy to use RegEx classes). Or calling wget and piping the output to my program.
Yeah, It's an addition to a C++ program, so saying 'use .Net' doesn't help. And no, I'm not going to use C++.Net as that is one of the most hideous hacks that I have ever seen and it's not worth porting the project for.

Calling an external program is just an ugly hack.

well, just to add up, have you checked ou the Chromium framework? - it's more a full browser, more than an HTML reader... but might be worth checkin out, i think :rolleyes:
No, I didn't bother because I don't need a full HTML reader at all. All I need to do is do a check on a small part of text on the website which is why I want it to be lightweight.

Thanks to all those with suggestions.
 
Back
Top