If you are to loop through these arrays which method would you choose

Which of this loop works faster for large array

$data = arrray(); // 100,000

//Method one

$chunk = array_chunk($data,1000);

foreach($chunk as $ch){
    foreach($ch as $row){
     //send email
     $email = $row['email'];
     send_email($email);
   }
}

//Method two

$chunk = array_chunk($data,1000);

$chunkTotal = count($chunk); // which gives 100
$i = 0;
$stopped = 0; // get number of stopped from server of session or database
$resume = $chunkTotal - $stopped;

for ($i = 0; $i <= $chunkTotal; $i++) {
    $email0 = (isset($chunk[0]['email'])) ? $chunk[0]['email'] : '';
send_email($email0);
$email1 = (isset($chunk[1]['email'])) ? $chunk[1]['email'] : '';
send_email($email1);
$email2 = (isset($chunk[2]['email'])) ? $chunk[2]['email] : '';
send_email($email2);
........ till $chunk[1000]['email']
}

Which of the two methods works best for you, if you have other better and faster method you can show it also.

Thanks house.

1 Like

Method two is unmaintainable. If the size of $data changes, you’d have to add or delete code lines commensurate, and your script wouldnt work correctly until you do.

(though you seem to have confused yourself with a for loop?)

Also I sincerely hope you’re not sending 100,000 individual emails on a regular basis?

2 Likes

I’d expect that the second code example wouldn’t have each of the 1000 email calls hand-coded like that, it would be another for loop to work through each $chunk array, and wouldn’t it run 0-999 rather than 0-1000? It would need to run to count($chunk[n]) in case the main array doesn’t happen to have exactly the expected number of elements.

I’d expect that the biggest thing affecting how much time either of these take is the sending of the emails. If you want to see which actual looping method is faster, comment out the code that sends the emails and just run the loops, and time them. I wouldn’t be surprised if each attempt to send an email takes a different amount of time, just because you’re adding things in which are out of your control, such as how quickly the SMTP server responds.

You could probably make the first one very slightly quicker by changing this

$email = $row['email'];
send_email($email);

to this

send_email($row['email']);

and not creating a variable for no apparent reason.

1 Like

two for loops vs two foreaches is… almost entirely semantics?

This bunch of stuff:

doesnt do anything.
$chunkTotal is only used in the for loop initializer and can be substituted.
$i is defined by the loop and is moot.
$stopped is a value to be substituted,
$resume is never used.

1 Like

Yeah the send mail function uses a queue mechanism. so the function logs excesses to be resent on another day.

For the hard coding, it maintains a 0-999 array keys, using an isset() evaluation before proceeding.

even if the array gets to 500,000 and its being chunk to 1000 the code still remains perfect as nothing was altered

As far as I can tell all array_chunk does here is duplicate the array into several more arrays which you then loop one by one. So its uses more memory, but isn’t actually made any faster. If anything, using array_chunk makes it slower.

I never knew that extra variables had impacts on code or performance.

will like to know why, bcs is like am over using variables in most of my codes

This is another dimension to the whole case, am puzzled.

Most recommendations suggests chunking into smaller parts, your remarks suggest it make performance slower.

Do you have a valid reason behind your assumption?

When you define a variable, PHP has to put it into its variable table. That takes up space, and time to do it. It’s small - microscopically so, but it happens.

I’m guessing most recommendations are suggesting chunking it into smaller parts and operating on one chunk per execution.

Functionally, there is no difference between:

for($x = 0; $x < 1000; $x++) {
  for($i = 0; $i < 1000; $i++) {
     echo "hi";
   }
}

and:

for($x = 0; $x < 1000000; $x++) {
     echo "hi";
}

Except in the second one, PHP doesnt have to keep track of and keep changing $i.

it is used but i limited the operations to just focus on loop.

it was meant to update the session with the last $i.

then when ever the code or page is visited it gets the last stopped and continue from there.

And how does the second page visit know when the first page stopped? Or if it’s finished? What if it’s still running?

wow thanks for this, will optimize my future codes, you never can tell how much this seamly negligible units can add up to

What about escaping maximum execution time?

The two blocks of code execute 1 million (strictly, they execute more like 3.01 million, but…) operations, regardless. Execution time is based on the script as a whole, not the block.

okay let me explain the full code

  1. get the array from database
  2. start looping
  3. add each last iteration number on session, cookie or table
  4. if $i is equal or greater than $chunkTotal mark processing board as finished, and so no redirection to this page again.
  5. if by any means it stopped halfway, process boards will send you back to these page.
  6. fetch data from database again
  7. get last stopped iteration number
  8. continue processing

I just wanted to make the question directly towards looping, i should have removed the other stopped variables i added

if thats the case chunking is of no use, bcs if there are other functions below the scripts still adds up.

I was thinking execution is just one run of a function.

like
foreach()
count()
exit()

and other function locally created

even if is just to echo hi, looping over 300,000 arrays will definitely timeout

Echoing Hi 300,000 times on my server ran in 0.02 seconds.

300k arrays in lass than 0.1 secs?

thats fair enough, could be due your server or machine?

Anyways adding other small functions inside the loop is where the extra processing times adds up.

But if chunking is not affecting anything, then the whole stuff is messed up.

Because my calculations i tested 1000 and it finished in 29secs which multiplied by 10 is almost 5mins

So you need to test your code, benchmark how long it takes to run the operations you want it to run on a single value set, and extrapolate your need from there.

If the function takes 29 seconds to operate on a SINGLE record, chunking in chunks of 1000 wont work either.