且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

***使用正则表达式下载php

更新时间:2023-02-23 13:52:11

问题不是正则表达式,问题是图书馆制作的***页面格式的假设。替换:

 } 

function get_***($ url){
$ p>

  else {echo无匹配; } 
}

function get_***($ url){

(在 get_flv_link()中向添加 else >),您会收到一条消息,指出内容不匹配。这可能意味着***改变了他们的页面格式(可能是因为人们试图像这样刮)。



此外,我很困惑:

  if(preg_match('/ watch_fullscreen(。*)plid / i',$ string,$ out)){
if !preg_match('/ watch_fullscreen(。*)plid / i',$ data,$ out)){
那就说如果匹配,那么检查它是否不匹配一个不同的变量(这里没有声明)。即使页面内容与预期内容相匹配,那么您可能还需要在库中修复的其他位。

i am using following code for downloading *** video.

    <?php
header("Cache-Control: no-cache, must-revalidate"); // HTTP/1.1
header("Expires: Sat, 26 Jul 1997 05:00:00 GMT"); // Date in the past

require_once('lib/***.lib.php');



if(preg_match('/***\.com/i',$_GET['url'])){

    if(!preg_match('/www\./i',$_GET['url'])){
        $_GET['url'] = str_replace('http://','http://www.',$_GET['url']);
    }
    list($video_id,$download_link) = get_***($_GET['url']);}

else{
    die('<span style="color:red;">Sorry, the URL is not recognized..</span>');
}

    ?>


    <p>
    <img src="http://img.***.com/vi/<?php echo trim($video_id);?>/1.jpg" alt="Preview 1" class="ythumb" />
    <img src="http://img.***.com/vi/<?php echo trim($video_id);?>/2.jpg" alt="Preview 2" class="ythumb" />
    <img src="http://img.***.com/vi/<?php echo trim($video_id);?>/3.jpg" alt="Preview 3" class="ythumb" />
    </p>
    <p>
    <a href="<?php echo trim($download_link);?>" class="ydl" title="Download as FLV">Download FLV</a>
    <a href="<?php echo trim($download_link);?>&fmt=35" class="ydl" title="Download as MP4">Download MP4</a>
    <a href="<?php echo trim($download_link);?>&fmt=17" class="ydl" title="Download as 3GP">Download 3GP</a>
    </p>

and my ge_*** function is included in ***.lib.php file. the file contains code..

     <?php



function get_content_of_url($url){
    $ohyeah = curl_init();
    curl_setopt($ohyeah, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ohyeah, CURLOPT_URL, $url);
    $data = curl_exec($ohyeah);
    curl_close($ohyeah);
    //print_r($data);
    return $data;
}


function get_flv_link($string) {  
if (preg_match('/watch_fullscreen(.*)plid/i', $string, $out)){


    if (!preg_match('/watch_fullscreen(.*)plid/i', $data, $out)) {  
    $outdata = $out[1];
      echo '1'.'<br>';
    $arrs = (explode('&',$outdata));
    foreach($arrs as $arr){
        list($i,$x) = explode("=",$arr);
        $$i = $x;
    }
    $link = 'http://www.***.com/get_video?video_id='.$video_id.'&t='.$t;
    echo '2';
    echo $link;
array($video_id,$link);
    return array($video_id,$link);
} 
}
}

function get_***($url){

    $stream = get_content_of_url($url);
 return get_flv_link($stream);
}



?>

the output is very interesting for me. there is no error displayed in output. but still i can get notthing. in code

> <a href="<?php echo trim($download_link);?>&fmt=17" class="ydl"
> title="Download as 3GP">Download 3GP</a>

is displayed in result but the link points to localhost.

It seems that i am missing some trick. Let me tell you i found this script while trying to learn php and curl .. any suggesstion or help from you?? Would you please help me convert this code to a working code?

thanks

The problem isn't the regex, the problem is the assumptions of ***'s page format that the library makes. Replace:

}

function get_***($url){

with

    else { echo "No match"; }
}

function get_***($url){

(which adds an else to the if in get_flv_link()) and you'll get a message saying that the content doesn't match. This probably means that *** have altered their page format (possibly because people were trying to scrape it like this).

Also, I'm confused by:

if (preg_match('/watch_fullscreen(.*)plid/i', $string, $out)){
    if (!preg_match('/watch_fullscreen(.*)plid/i', $data, $out)) {

That says "if it matches then check if it doesn't match on a different variable (which hasn't been declared at this point)". Even if the page content matched the expected content then you'd probably have other bits that needed fixing in the library.