Drukowana wersja tematu

Kliknij tu, aby zobaczyć temat w orginalnym formacie

Forum PHP.pl _ Object-oriented programming _ Złączone dane

Napisany przez: SN@JPER^ 24.11.2017, 17:44:49

Napisałem sobie taką oto klasę:

  1.  
  2. <?php
  3.  
  4. class Scrapper{
  5.  
  6. public $url;
  7. private $data;
  8. private $dataAfter;
  9. private $doc;
  10. private $xpath;
  11. private $ch;
  12.  
  13. function __construct($url){
  14.  
  15. if (http://www.php.net/preg_match('/^http/', $url)) {
  16.  
  17. libxml_use_internal_errors(true);
  18.  
  19. $this->url = $url;
  20. $this->data = $this->curl($this->url);
  21.  
  22.  
  23. $this->doc = new \DOMDocument();
  24. $this->doc->loadHTML($this->data);
  25.  
  26. $this->xpath = new DOMXPath($this->doc);
  27.  
  28. }
  29. }
  30.  
  31. public function queryTag($query){
  32.  
  33. if(!http://www.php.net/empty($query)){
  34.  
  35. $this->data = $this->xpath->query($query);
  36.  
  37. return $this;
  38. }
  39. }
  40.  
  41. public function getData($noHTML = false, $removeAttribute = false){
  42.  
  43. foreach ($this->data as $dataNodes){
  44.  
  45. if($removeAttribute === true) {
  46. $dataNodes->removeAttribute('style');
  47. $dataNodes->removeAttribute('class');
  48. $dataNodes->removeAttribute('id');
  49. }
  50.  
  51. if($noHTML === true){
  52. $this->dataAfter .= $dataNodes->nodeValue;
  53. }else{
  54. $this->dataAfter .= $dataNodes->ownerDocument->saveHTML($dataNodes);
  55. }
  56.  
  57. }
  58.  
  59. return $this->dataAfter;
  60. }
  61.  
  62. private function curl($url){
  63. if(!http://www.php.net/empty($url)) {
  64.  
  65. $options = http://www.php.net/array(
  66. CURLOPT_RETURNTRANSFER => TRUE, // Setting cURL's option to return the webpage data
  67. CURLOPT_FOLLOWLOCATION => TRUE, // Setting cURL to follow 'location' HTTP headers
  68. CURLOPT_AUTOREFERER => TRUE, // Automatically set the referer where following 'location' HTTP headers
  69. CURLOPT_CONNECTTIMEOUT => 120, // Setting the amount of time (in seconds) before the request times out
  70. CURLOPT_TIMEOUT => 120, // Setting the maximum amount of time for cURL to execute queries
  71. CURLOPT_MAXREDIRS => 10, // Setting the maximum number of redirections to follow
  72. CURLOPT_USERAGENT => "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1a2pre) Gecko/2008073000 Shredder/3.0a2pre ThunderBrowse/3.2.1.8", // Setting the useragent
  73. CURLOPT_URL => $this->url, // Setting cURL's URL option with the $url variable passed into the function
  74. );
  75.  
  76. $this->ch = curl_init();
  77. curl_setopt_array($this->ch, $options);
  78. $this->data = curl_exec($this->ch);
  79.  
  80. return $this->data;
  81. }
  82. }
  83.  
  84. function __destruct(){
  85.  
  86. curl_close($this->ch);
  87.  
  88. }
  89.  
  90. }
  91.  
  92.  
  93. $class = new \Scrapper('http://www.....');
  94.  
  95. $pic = $class->queryTag('//div[@id="left"]//img[@class="pic"]/@src')->getData();
  96. $title = $class->queryTag('//div[@id="left"]//h2')->getData(true);
  97. $text = $class->queryTag('//div[@id="left"]/p | //center')->getData(false, true);
  98.  
  99. http://www.php.net/echo $title;
  100. http://www.php.net/echo '<hr>';
  101. http://www.php.net/echo $pic;
  102. http://www.php.net/echo '<hr>';
  103. http://www.php.net/echo $text;
  104. http://www.php.net/echo '<hr>';
  105.  


Po wywołaniu tej klasy, przypisuję do każdej zmiennej szukanej wartości - zdjęcie, tytuł i treść.

Niestety tytuł zawiera również ciąg URL obrazka, natomiast tekst zawiera dodatkowo obrazek oraz tytuł. Gdzie robię błąd? Jak to oddzielić?

Jednocześnie proszę o sugestię co mogę poprawić w samej klasie.

Napisany przez: trueblue 24.11.2017, 18:59:11

Pokaż kawałek tej struktury, którą parsujesz.

Napisany przez: Pyton_000 24.11.2017, 19:05:40

Jak dla mnie to ta klasa sama w sobie jest do zaorania wink.gif

Napisany przez: SN@JPER^ 24.11.2017, 19:51:31

Cytat(trueblue @ 24.11.2017, 18:59:11 ) *
Pokaż kawałek tej struktury, którą parsujesz.


Prosty przykład: https://www.tehplayground.com/SCtDYOUp67t0EPHt


Cytat(Pyton_000 @ 24.11.2017, 19:05:40 ) *
Jak dla mnie to ta klasa sama w sobie jest do zaorania wink.gif


Co proponujesz?

Napisany przez: trueblue 24.11.2017, 20:08:08

Wciąż doklejasz dane do dataAfter.

Napisany przez: SN@JPER^ 24.11.2017, 20:15:40

Działa gdy zmieniłem na:

  1.  
  2. public function getData($noHTML = false, $removeAttribute = false){
  3.  
  4. $data_after1 = '';
  5. foreach ($this->data as $dataNodes){
  6.  
  7. if($removeAttribute === true) {
  8. $dataNodes->removeAttribute('style');
  9. $dataNodes->removeAttribute('class');
  10. $dataNodes->removeAttribute('id');
  11. }
  12.  
  13. if($noHTML === true){
  14. $data_after1 .= $dataNodes->nodeValue;
  15. }else{
  16. $data_after1 .= $dataNodes->ownerDocument->saveHTML($dataNodes);
  17. }
  18.  
  19. }
  20.  
  21. return $data_after1;
  22. }
  23.  

Napisany przez: abriljoseph 24.04.2018, 07:13:18

The Semantic Web is a Web of Data — of dates and titles and part numbers and chemical properties and any other data one might conceive of. The collection of Semantic Web technologies (RDF, OWL, SKOS, SPARQL, etc.) provides an environment where application can query that data, draw inferences using vocabularies, etc.

However, to make the Web of Data a reality, it is important to have the huge amount of data on the Web available in a standard format, reachable and manageable by Semantic Web tools. Furthermore, not only does the Semantic Web need access to data, but relationships among data should be made available, too, to create a Web of Data (as opposed to a sheer collection of datasets). This collection of interrelated datasets on the Web can also be referred to as Linked Data.

https://www.besanttechnologies.com/training-courses/data-warehousing-training/hadoop-training-institute-in-chennai

Powered by Invision Power Board (http://www.invisionboard.com)
© Invision Power Services (http://www.invisionpower.com)